Detecting When LLMs Change

Detecting When LLMs Change

A statistical approach to monitor black-box language models

This research provides a method to detect modifications in black-box Large Language Models by analyzing statistical distributions of linguistic features in generated text.

  • Monitors LLMs for changes without requiring access to internal model parameters
  • Uses statistical tests to compare distributions of linguistic features across text samples
  • Can detect both intentional updates and security threats like prompt injections
  • Enables developers to maintain security and reliability when using third-party LLM services

For security teams, this approach offers a practical solution to the critical challenge of knowing when your foundation models have changed, helping prevent unexpected behaviors in production applications.

You've Changed: Detecting Modification of Black-Box Large Language Models

13 | 14