Exploiting LLM Security Vulnerabilities in Structured Outputs

Exploiting LLM Security Vulnerabilities in Structured Outputs

How prefix-tree mechanisms can be manipulated to bypass safety filters

This research reveals a novel jailbreak attack vector targeting structured output interfaces in Large Language Models, demonstrating how prefix-tree mechanisms can be exploited to generate harmful content despite safety measures.

  • Introduces the first attack targeting structured output interfaces like JSON and XML
  • Demonstrates how prefix completion features can bypass safety mechanisms
  • Achieves up to 99.9% success rate in generating harmful content through various LLM platforms
  • Proposes potential defensive strategies including prefix monitoring and refined safety alignment

This work highlights critical security vulnerabilities in commercial LLMs, showing how seemingly harmless interface choices can create significant safety gaps that malicious actors could exploit.

Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking

98 | 157