
Exploiting Dialogue History for LLM Attacks
How attackers can manipulate conversation context to bypass safety measures
This research reveals a critical security vulnerability in LLMs that leverages conversation history to execute jailbreak attacks, circumventing standard safety mechanisms.
- Dialogue Injection Attack - A novel technique that manipulates conversation context to bypass safety guardrails
- Historical leverage - Exploits LLMs' reliance on dialogue history rather than single-turn interactions
- High effectiveness - Achieved 87.5% attack success rate on GPT-4 and 100% on Claude-2
- Defense challenges - Standard defenses proved inadequate against this attack vector
This research matters for security because it exposes a fundamental vulnerability in how LLMs process multi-turn conversations, requiring new defense strategies for deployed AI systems.
Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation