Exploiting Vision-Language Models

Exploiting Vision-Language Models

New Black-Box Jailbreak Attack Maximizes Toxic Outputs in LVLMs

This research introduces a novel bimodal jailbreak attack that exploits the interaction between images and text to compromise Large Vision Language Models without requiring model access.

  • Leverages prior knowledge about harmful content to guide image-text attack strategies
  • Achieves up to 83.29% attack success rates in black-box settings
  • Demonstrates how multi-modal inputs create unique security vulnerabilities
  • Reveals limitations in current safety mechanisms for vision-language models

Implications for Security: This work exposes critical vulnerabilities in deployed LVLMs, highlighting the need for more robust multi-modal safety techniques before widespread deployment in sensitive applications.

PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization

18 | 100