
GLOV: Leveraging LLMs to Enhance Vision Models
Using language models as optimization guides for vision-language systems
GLOV transforms large language models (LLMs) into implicit optimizers that guide vision-language models (VLMs) to achieve better performance and enhanced security.
- Improves zero-shot classification accuracy with CLIP by generating optimized prompts
- Reduces attack success rates on state-of-the-art VLMs by up to 60.7%
- Creates a feedback loop where LLMs continuously refine VLM prompts based on performance
- Achieves these improvements without additional training or model modification
This research is significant for security applications as it demonstrates how LLM-guided optimization can make vision models more resistant to adversarial attacks, protecting AI systems in critical deployments.
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models