Selective Forgetting for Safer LLMs

This research introduces Penalty Regularized Gradient Ascent (PR-GA), a novel method for targeted knowledge removal from large language models without degrading their overall performance.

Achieves 7.5× more stable optimization than standard gradient ascent methods
Maintains model functionality while precisely removing unwanted knowledge
Requires minimal parameter updates, making it efficient for deployment
Demonstrates effectiveness across multiple knowledge domains and model types

Why it matters: This technique addresses critical privacy and security challenges by enabling selective removal of sensitive data from LLMs without costly retraining, significantly reducing the risk of data leakage while preserving model capabilities.

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs