
Detecting When LLMs Change
A statistical approach to monitor black-box language models
This research provides a method to detect modifications in black-box Large Language Models by analyzing statistical distributions of linguistic features in generated text.
- Monitors LLMs for changes without requiring access to internal model parameters
- Uses statistical tests to compare distributions of linguistic features across text samples
- Can detect both intentional updates and security threats like prompt injections
- Enables developers to maintain security and reliability when using third-party LLM services
For security teams, this approach offers a practical solution to the critical challenge of knowing when your foundation models have changed, helping prevent unexpected behaviors in production applications.
You've Changed: Detecting Modification of Black-Box Large Language Models