
Enhancing LLM Unlearning
A framework to remove sensitive data while preserving model utility
This research introduces a general framework that improves existing fine-tuning-based unlearning methods for large language models while maintaining their utility for normal tasks.
- Addresses the challenge of removing copyrighted and privacy-sensitive data from LLMs
- Enhances both gradient ascent-based and suppression-based unlearning approaches
- Preserves model performance on standard tasks while effectively removing target information
This work is crucial for security practitioners as it provides a practical approach to selectively remove sensitive data from deployed LLMs without compromising their overall performance, helping organizations meet compliance requirements while maintaining valuable AI assets.
A General Framework to Enhance Fine-tuning-based LLM Unlearning