Securing Vision-Language Models Against Noise Attacks

Securing Vision-Language Models Against Noise Attacks

New defenses against jailbreak attacks using noisy images

This research identifies a critical security gap in Vision-Language Models (VLMs) that makes them vulnerable to jailbreak attacks using noise-augmented images, and proposes effective defense mechanisms.

  • VLMs are susceptible to perturbation-based attacks using Gaussian noise that can bypass safety measures
  • The paper introduces Robust-VLGuard and DiffPure-VLM, two defense strategies that significantly improve security
  • Testing across multiple VLMs showed these defenses can reduce attack success rates by up to 90%
  • Implementation requires minimal computational overhead while maintaining model performance

For security professionals, this research highlights the importance of considering noise robustness during VLM training and deployment, closing a significant vulnerability in AI systems that process both visual and textual content.

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

87 | 100