
Detecting LLM Training Data Exposure
New Attack Method Requires Only Generated Outputs
Researchers demonstrate how to determine if specific data was used to train large language models, even with minimal access to model outputs.
- Introduces PETAL, a novel label-only membership inference attack
- Works without needing complete model logits—only generated text
- Poses significant privacy and security implications for deployed LLMs
- Highlights vulnerability even in commercial models with restricted access
This research matters for security professionals because it reveals how attackers can extract sensitive information about training data with far fewer privileges than previously thought necessary, potentially enabling data privacy violations at scale.
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models