
Security Risks in Code Language Models
Investigating Data Extraction Vulnerabilities Before and After Fine-tuning
This research examines how code language models can be exploited to extract sensitive data from their training datasets, revealing significant security vulnerabilities.
- Pre-trained models can memorize and regurgitate training data when prompted with specific attacks
- Fine-tuning these models doesn't eliminate vulnerabilities—it can create new security risks
- Developers using both pre-trained and fine-tuned models must implement robust privacy safeguards
- Organizations leveraging code models should audit for potential data leakage
This work is critical for security professionals as it highlights how AI models handling code can expose intellectual property, credentials, and other sensitive information embedded in training data.