Uncovering Dataset Contamination in LLMs

Uncovering Dataset Contamination in LLMs

A new metric for measuring training data leakage into evaluation sets

This research introduces Kernel Divergence Score (KDS), a novel method to quantify dataset contamination in large language models, addressing a critical reliability issue in AI evaluation.

  • Detects when evaluation datasets overlap with pre-training data, preventing artificially inflated performance metrics
  • Provides a mathematical framework to measure the degree of contamination in benchmark datasets
  • Helps researchers distinguish between genuine model capabilities and memorization of training examples
  • Enables more trustworthy evaluation of language models across different benchmarks

From a security perspective, this work is crucial as it helps prevent misleading claims about model performance and ensures evaluations accurately reflect true generalization capabilities rather than memorization.

How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence

6 | 26