
Privacy-Preserving Synthetic Text Generation
Enhancing data privacy without costly LLM fine-tuning
This research introduces CTCL (Co-Training with Contrastive Learning), a novel technique that generates high-quality synthetic text data while preserving privacy guarantees without the computational expense of fine-tuning large language models.
- Creates privacy-preserving synthetic text data with significantly reduced computational costs
- Employs an innovative co-training approach with contrastive learning for better data quality
- Outperforms existing prompt-based methods by efficiently using private information
- Maintains strong differential privacy guarantees while generating useful training data
Key security implications include enabling organizations with limited computational resources to leverage privacy-preserving synthetic data, expanding practical applications of differential privacy in highly sensitive domains like healthcare and finance.
Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs