SafeEraser: Making AI Forget Harmful Content

SafeEraser: Making AI Forget Harmful Content

Advancing safety through multimodal machine unlearning

SafeEraser introduces a groundbreaking benchmark for removing harmful knowledge from Multimodal Large Language Models while preserving beneficial capabilities.

  • Creates a safety unlearning benchmark with 3,000 images and 28.8K VQA pairs
  • Enables targeted removal of unsafe content while maintaining model performance
  • Provides a framework for evaluating unlearning effectiveness in multimodal models
  • Addresses critical security vulnerabilities in increasingly powerful AI systems

This research is vital for developing safer AI systems as MLLMs become more widely deployed in consumer applications, helping prevent harmful outputs while maintaining functionality.

SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning

22 | 51