Breaking the Logic Chains in LLMs

Breaking the Logic Chains in LLMs

A Mathematical Framework for Understanding Rule Subversion

This research introduces Logicbreaks, a formal framework that reveals how malicious prompts can subvert rule-following in large language models by exploiting vulnerabilities in their logical reasoning processes.

  • Formalizes rule-following as propositional Horn logic and proves that even well-designed models remain vulnerable
  • Identifies specific attack patterns that can mislead LLMs into making incorrect logical inferences
  • Demonstrates that common defensive measures may be insufficient against sophisticated logical manipulation
  • Provides a theoretical foundation for understanding jailbreak attacks in real-world AI systems

For security professionals, this work offers critical insights into fundamental vulnerabilities in AI guardrails, helping anticipate and mitigate potential exploits that bypass logical safety mechanisms.

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

24 | 157