Exploiting LLM Security Vulnerabilities in Structured Outputs

This research reveals a novel jailbreak attack vector targeting structured output interfaces in Large Language Models, demonstrating how prefix-tree mechanisms can be exploited to generate harmful content despite safety measures.

Introduces the first attack targeting structured output interfaces like JSON and XML
Demonstrates how prefix completion features can bypass safety mechanisms
Achieves up to 99.9% success rate in generating harmful content through various LLM platforms
Proposes potential defensive strategies including prefix monitoring and refined safety alignment

This work highlights critical security vulnerabilities in commercial LLMs, showing how seemingly harmless interface choices can create significant safety gaps that malicious actors could exploit.

Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking