Privacy-Preserving Synthetic Text Generation

This research introduces CTCL (Co-Training with Contrastive Learning), a novel technique that generates high-quality synthetic text data while preserving privacy guarantees without the computational expense of fine-tuning large language models.

Creates privacy-preserving synthetic text data with significantly reduced computational costs
Employs an innovative co-training approach with contrastive learning for better data quality
Outperforms existing prompt-based methods by efficiently using private information
Maintains strong differential privacy guarantees while generating useful training data

Key security implications include enabling organizations with limited computational resources to leverage privacy-preserving synthetic data, expanding practical applications of differential privacy in highly sensitive domains like healthcare and finance.

Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs