Bridging Safety Gaps in Vision-Language Models

This research addresses critical security vulnerabilities in Large Vision-Language Models (LVLMs) where text safety mechanisms fail to transfer to visual inputs.

Identifies why current vision-language alignment methods don't transfer text safety to vision modalities
Maps the operational mechanisms of safety systems within LVLMs
Conducts comparative analysis between text and vision safety processing
Develops techniques to strengthen protection against harmful visual content

This work is crucial for security as it helps protect AI systems from exploitation through toxic imagery, ensuring safer deployment of multimodal AI systems in real-world applications.

Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models