Exploiting Dialogue History for LLM Attacks

Exploiting Dialogue History for LLM Attacks

How attackers can manipulate conversation context to bypass safety measures

This research reveals a critical security vulnerability in LLMs that leverages conversation history to execute jailbreak attacks, circumventing standard safety mechanisms.

  • Dialogue Injection Attack - A novel technique that manipulates conversation context to bypass safety guardrails
  • Historical leverage - Exploits LLMs' reliance on dialogue history rather than single-turn interactions
  • High effectiveness - Achieved 87.5% attack success rate on GPT-4 and 100% on Claude-2
  • Defense challenges - Standard defenses proved inadequate against this attack vector

This research matters for security because it exposes a fundamental vulnerability in how LLMs process multi-turn conversations, requiring new defense strategies for deployed AI systems.

Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation

128 | 157