
Selective Skill Unlearning in LLMs
Training-free techniques to control model capabilities
This research introduces two lightweight methods to selectively remove specific skills from LLMs while preserving their overall functionality.
- Intervention Strategy: Targets skill-specific tokens during generation to prevent undesired capabilities
- Abstention Mechanism: Enables models to recognize and decline requests requiring unlearned skills
- Security Impact: Provides practical tools to mitigate risks from potentially harmful model capabilities
- Implementation Advantage: Methods require no retraining, making them efficient and accessible to deploy
These approaches offer security teams practical solutions for controlling AI capabilities without compromising overall model performance.
Effective Skill Unlearning through Intervention and Abstention