Detecting LLM Jailbreaks through Geometry

CurvaLID is a new security framework that identifies adversarial prompts by analyzing their geometric properties in LLM embedding spaces, enabling more secure AI deployment.

Leverages the distinct curvature profiles of malicious prompts to detect attacks
Operates as a pre-processing filter without modifying the underlying model
Achieves high detection accuracy while maintaining performance on legitimate prompts
Provides a computationally efficient defense mechanism suitable for real-world applications

This research is critical for security as it addresses a fundamental vulnerability in LLMs, potentially preventing malicious actors from circumventing AI safety measures while preserving model functionality for legitimate users.

CURVALID: Geometrically-guided Adversarial Prompt Detection