
Verifying What's Behind the API Curtain
Detecting hidden modifications in deployed language models
This research introduces Model Equality Testing - a methodology to detect if API providers have modified, quantized, or watermarked the models they claim to serve.
Key findings:
- Even subtle model modifications (like quantization or watermarking) can be detected with high confidence using carefully constructed prompts
- The detection method works as a black-box test requiring only API access
- Researchers developed a statistical framework for detecting unauthorized model alterations with minimal queries
- The approach can identify modifications across popular models like GPT and Llama families
For security teams, this research provides crucial tools to verify model integrity and ensure compliance with model license terms, helping organizations confirm they're getting the exact model capabilities they're paying for.