Mapping LLM Vulnerabilities

Mapping LLM Vulnerabilities

A Systematic Classification of Jailbreak Attack Vectors

This research presents the first comprehensive taxonomy of jailbreak vulnerabilities in Large Language Models, categorizing ways attackers bypass AI safety guardrails.

  • Four primary attack vectors identified: model mechanics exploitation, context manipulation, cognitive biases, and direct prompting
  • Reveals how attackers exploit fundamental conflicts between helpfulness and safety alignment
  • Provides a structured framework for understanding and addressing security weaknesses in commercial and open-source LLMs

This classification system enables security teams to develop more comprehensive defenses by understanding the full spectrum of potential attacks against AI systems.

A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models

148 | 157