Detecting LLM Training Data Exposure

Detecting LLM Training Data Exposure

New Attack Method Requires Only Generated Outputs

Researchers demonstrate how to determine if specific data was used to train large language models, even with minimal access to model outputs.

  • Introduces PETAL, a novel label-only membership inference attack
  • Works without needing complete model logits—only generated text
  • Poses significant privacy and security implications for deployed LLMs
  • Highlights vulnerability even in commercial models with restricted access

This research matters for security professionals because it reveals how attackers can extract sensitive information about training data with far fewer privileges than previously thought necessary, potentially enabling data privacy violations at scale.

Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models

14 | 26