Breaking the Logic Chains in LLMs

This research introduces Logicbreaks, a formal framework that reveals how malicious prompts can subvert rule-following in large language models by exploiting vulnerabilities in their logical reasoning processes.

Formalizes rule-following as propositional Horn logic and proves that even well-designed models remain vulnerable
Identifies specific attack patterns that can mislead LLMs into making incorrect logical inferences
Demonstrates that common defensive measures may be insufficient against sophisticated logical manipulation
Provides a theoretical foundation for understanding jailbreak attacks in real-world AI systems

For security professionals, this work offers critical insights into fundamental vulnerabilities in AI guardrails, helping anticipate and mitigate potential exploits that bypass logical safety mechanisms.

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference