Uncovering Dataset Contamination in LLMs

This research introduces Kernel Divergence Score (KDS), a novel method to quantify dataset contamination in large language models, addressing a critical reliability issue in AI evaluation.

Detects when evaluation datasets overlap with pre-training data, preventing artificially inflated performance metrics
Provides a mathematical framework to measure the degree of contamination in benchmark datasets
Helps researchers distinguish between genuine model capabilities and memorization of training examples
Enables more trustworthy evaluation of language models across different benchmarks

From a security perspective, this work is crucial as it helps prevent misleading claims about model performance and ensures evaluations accurately reflect true generalization capabilities rather than memorization.

How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence