
Enhancing Safety in Visual AI
Addressing Critical Gaps in Vision-Language Model Safety
This research identifies and addresses fundamental weaknesses in current safety fine-tuning approaches for Vision-Language Models (VLMs).
- Reveals a safety reasoning gap where models fail to properly analyze visual content in safety-critical contexts
- Introduces a novel Multi-Image Safety dataset specifically designed for training VLMs on safety scenarios
- Proposes an effective Safety Reasoning Fine-tuning approach that improves model safety without compromising helpfulness
- Demonstrates significant reduction in attack success rates while maintaining model utility
For security professionals, this research offers crucial insights into protecting VLM deployments against manipulation while preserving functionality in safety-critical applications.
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models