Verifying What's Behind the API Curtain

This research introduces Model Equality Testing - a methodology to detect if API providers have modified, quantized, or watermarked the models they claim to serve.

Key findings:

Even subtle model modifications (like quantization or watermarking) can be detected with high confidence using carefully constructed prompts
The detection method works as a black-box test requiring only API access
Researchers developed a statistical framework for detecting unauthorized model alterations with minimal queries
The approach can identify modifications across popular models like GPT and Llama families

For security teams, this research provides crucial tools to verify model integrity and ensure compliance with model license terms, helping organizations confirm they're getting the exact model capabilities they're paying for.

Model Equality Testing: Which Model Is This API Serving?