Safer AI Through Selective Forgetting

FALCON introduces a novel approach for machine unlearning that precisely removes sensitive information from language models while preserving model utility.

Uses fine-grained activation manipulation rather than traditional loss combinations
Applies contrastive orthogonal vectors to selectively target specific knowledge
Achieves superior results in removing harmful content without degrading overall model performance
Demonstrates significant improvements in preventing information recovery attacks

This research addresses critical security concerns by offering a more precise method to eliminate harmful or sensitive information from AI systems, reducing risks of data leaks and malicious exploitation while maintaining functionality.

FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model