Detecting LLM Training Data Exposure

Researchers demonstrate how to determine if specific data was used to train large language models, even with minimal access to model outputs.

Introduces PETAL, a novel label-only membership inference attack
Works without needing complete model logits—only generated text
Poses significant privacy and security implications for deployed LLMs
Highlights vulnerability even in commercial models with restricted access

This research matters for security professionals because it reveals how attackers can extract sensitive information about training data with far fewer privileges than previously thought necessary, potentially enabling data privacy violations at scale.

Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models