Selective Skill Unlearning in LLMs

This research introduces two lightweight methods to selectively remove specific skills from LLMs while preserving their overall functionality.

Intervention Strategy: Targets skill-specific tokens during generation to prevent undesired capabilities
Abstention Mechanism: Enables models to recognize and decline requests requiring unlearned skills
Security Impact: Provides practical tools to mitigate risks from potentially harmful model capabilities
Implementation Advantage: Methods require no retraining, making them efficient and accessible to deploy

These approaches offer security teams practical solutions for controlling AI capabilities without compromising overall model performance.