SafeEraser: Making AI Forget Harmful Content

SafeEraser introduces a groundbreaking benchmark for removing harmful knowledge from Multimodal Large Language Models while preserving beneficial capabilities.

Creates a safety unlearning benchmark with 3,000 images and 28.8K VQA pairs
Enables targeted removal of unsafe content while maintaining model performance
Provides a framework for evaluating unlearning effectiveness in multimodal models
Addresses critical security vulnerabilities in increasingly powerful AI systems

This research is vital for developing safer AI systems as MLLMs become more widely deployed in consumer applications, helping prevent harmful outputs while maintaining functionality.

SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning