Testing the Guardians: LLM Security Coverage

This research evaluates how effectively traditional software testing methods can identify security vulnerabilities in Large Language Models, with a focus on jailbreak attacks.

Coverage-based testing shows promise for identifying LLM vulnerabilities
Neuron activation patterns correlate with successful jailbreak attempts
Novel coverage criteria outperform traditional methods in detecting potential security issues
Practical detection mechanisms enable early identification of jailbreak vulnerabilities

For security teams, this research provides critical new tools to test and strengthen LLM defenses before deployment, helping prevent malicious exploitation while maintaining the utility of these powerful models.

Understanding the Effectiveness of Coverage Criteria for Large Language Models: A Special Angle from Jailbreak Attacks