
Mapping LLM Vulnerabilities
A Systematic Classification of Jailbreak Attack Vectors
This research presents the first comprehensive taxonomy of jailbreak vulnerabilities in Large Language Models, categorizing ways attackers bypass AI safety guardrails.
- Four primary attack vectors identified: model mechanics exploitation, context manipulation, cognitive biases, and direct prompting
- Reveals how attackers exploit fundamental conflicts between helpfulness and safety alignment
- Provides a structured framework for understanding and addressing security weaknesses in commercial and open-source LLMs
This classification system enables security teams to develop more comprehensive defenses by understanding the full spectrum of potential attacks against AI systems.
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models