
Making Multimodal AI Assistants Safer
Enhancing MLLM safety through preference optimization
This research addresses the critical challenge of safety alignment in Multimodal Large Language Models through a novel preference dataset and optimization technique.
- Created MMSafe-PO, a specialized safety preference dataset for multimodal models
- Developed Blind Preference Optimization (BPO) to enhance safety without compromising capabilities
- Achieved significant reduction in unsafe responses across multiple benchmarks
- Demonstrated effective transfer of safety capabilities to new harmful scenarios
This work provides a scalable approach to mitigating safety risks in multimodal AI systems while maintaining their performance on core tasks, addressing a key security concern for enterprise AI deployment.
Original Paper: Towards Harmless Multimodal Assistants with Blind Preference Optimization