Mapping LLM Vulnerabilities

This research presents the first comprehensive taxonomy of jailbreak vulnerabilities in Large Language Models, categorizing ways attackers bypass AI safety guardrails.

Four primary attack vectors identified: model mechanics exploitation, context manipulation, cognitive biases, and direct prompting
Reveals how attackers exploit fundamental conflicts between helpfulness and safety alignment
Provides a structured framework for understanding and addressing security weaknesses in commercial and open-source LLMs

This classification system enables security teams to develop more comprehensive defenses by understanding the full spectrum of potential attacks against AI systems.

A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models