Breaking Through LLM Defenses

Siege is a new framework that systematically erodes AI safety guardrails through multiple conversation turns, revealing critical vulnerabilities in LLM security systems.

Uses breadth-first tree search to explore and exploit partial policy leaks
Tracks incremental compliance across conversation turns
Achieves higher jailbreak success rates than single-turn attacks
Reveals how LLMs gradually compromise on harmful requests

This research exposes significant security concerns for deployed AI systems, demonstrating how persistent attackers can methodically break down safety measures over time rather than with a single prompt.

Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search