Privacy-Preserving Synthetic Text Generation

Privacy-Preserving Synthetic Text Generation

Enhancing data privacy without costly LLM fine-tuning

This research introduces CTCL (Co-Training with Contrastive Learning), a novel technique that generates high-quality synthetic text data while preserving privacy guarantees without the computational expense of fine-tuning large language models.

  • Creates privacy-preserving synthetic text data with significantly reduced computational costs
  • Employs an innovative co-training approach with contrastive learning for better data quality
  • Outperforms existing prompt-based methods by efficiently using private information
  • Maintains strong differential privacy guarantees while generating useful training data

Key security implications include enabling organizations with limited computational resources to leverage privacy-preserving synthetic data, expanding practical applications of differential privacy in highly sensitive domains like healthcare and finance.

Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs

82 | 96