Detecting When LLMs Change

This research provides a method to detect modifications in black-box Large Language Models by analyzing statistical distributions of linguistic features in generated text.

Monitors LLMs for changes without requiring access to internal model parameters
Uses statistical tests to compare distributions of linguistic features across text samples
Can detect both intentional updates and security threats like prompt injections
Enables developers to maintain security and reliability when using third-party LLM services

For security teams, this approach offers a practical solution to the critical challenge of knowing when your foundation models have changed, helping prevent unexpected behaviors in production applications.

You've Changed: Detecting Modification of Black-Box Large Language Models