
Uncovering Dataset Contamination in LLMs
A new metric for measuring training data leakage into evaluation sets
This research introduces Kernel Divergence Score (KDS), a novel method to quantify dataset contamination in large language models, addressing a critical reliability issue in AI evaluation.
- Detects when evaluation datasets overlap with pre-training data, preventing artificially inflated performance metrics
- Provides a mathematical framework to measure the degree of contamination in benchmark datasets
- Helps researchers distinguish between genuine model capabilities and memorization of training examples
- Enables more trustworthy evaluation of language models across different benchmarks
From a security perspective, this work is crucial as it helps prevent misleading claims about model performance and ensures evaluations accurately reflect true generalization capabilities rather than memorization.