Selective Skill Unlearning in LLMs

Selective Skill Unlearning in LLMs

Training-free techniques to control model capabilities

This research introduces two lightweight methods to selectively remove specific skills from LLMs while preserving their overall functionality.

  • Intervention Strategy: Targets skill-specific tokens during generation to prevent undesired capabilities
  • Abstention Mechanism: Enables models to recognize and decline requests requiring unlearned skills
  • Security Impact: Provides practical tools to mitigate risks from potentially harmful model capabilities
  • Implementation Advantage: Methods require no retraining, making them efficient and accessible to deploy

These approaches offer security teams practical solutions for controlling AI capabilities without compromising overall model performance.

Effective Skill Unlearning through Intervention and Abstention

43 | 51