
Fortifying Vision-Language Models Against Attacks
A novel preference optimization approach for robust LVLMs
This research introduces AdPO (Adversarial Preference Optimization), a novel method to defend large vision-language models against adversarial attacks without sacrificing performance on clean inputs.
- Addresses critical security vulnerabilities in LVLMs like GPT-4o and LLaVA
- Improves robustness against adversarial attacks that could cause erroneous or malicious outputs
- Overcomes limitations of existing defenses that typically degrade clean input performance
- Employs preference optimization to maintain model effectiveness while enhancing security
As LVLMs become increasingly deployed in real-world applications, these security improvements are essential to prevent exploitation and ensure reliable performance in critical environments.