Fortifying Vision-Language Models Against Attacks

Fortifying Vision-Language Models Against Attacks

A novel preference optimization approach for robust LVLMs

This research introduces AdPO (Adversarial Preference Optimization), a novel method to defend large vision-language models against adversarial attacks without sacrificing performance on clean inputs.

  • Addresses critical security vulnerabilities in LVLMs like GPT-4o and LLaVA
  • Improves robustness against adversarial attacks that could cause erroneous or malicious outputs
  • Overcomes limitations of existing defenses that typically degrade clean input performance
  • Employs preference optimization to maintain model effectiveness while enhancing security

As LVLMs become increasingly deployed in real-world applications, these security improvements are essential to prevent exploitation and ensure reliable performance in critical environments.

AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

88 | 100