Breaking Through LLM Defenses

Breaking Through LLM Defenses

How Tree Search Creates Sophisticated Multi-Turn Attacks

Siege is a new framework that systematically erodes AI safety guardrails through multiple conversation turns, revealing critical vulnerabilities in LLM security systems.

  • Uses breadth-first tree search to explore and exploit partial policy leaks
  • Tracks incremental compliance across conversation turns
  • Achieves higher jailbreak success rates than single-turn attacks
  • Reveals how LLMs gradually compromise on harmful requests

This research exposes significant security concerns for deployed AI systems, demonstrating how persistent attackers can methodically break down safety measures over time rather than with a single prompt.

Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search

131 | 157