
Exploiting Vision-Language Models
New Black-Box Jailbreak Attack Maximizes Toxic Outputs in LVLMs
This research introduces a novel bimodal jailbreak attack that exploits the interaction between images and text to compromise Large Vision Language Models without requiring model access.
- Leverages prior knowledge about harmful content to guide image-text attack strategies
- Achieves up to 83.29% attack success rates in black-box settings
- Demonstrates how multi-modal inputs create unique security vulnerabilities
- Reveals limitations in current safety mechanisms for vision-language models
Implications for Security: This work exposes critical vulnerabilities in deployed LVLMs, highlighting the need for more robust multi-modal safety techniques before widespread deployment in sensitive applications.
PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization