Defending LLMs Against Jailbreak Attacks

Defending LLMs Against Jailbreak Attacks

A Lightweight Token Distribution Approach for Enhanced Security

LightDefense offers a resource-efficient solution to protect Large Language Models from jailbreak attacks by strategically adjusting token probabilities.

  • Shifts token distribution to prioritize safety disclaimers without additional model training
  • Targets white-box models with a safety-oriented approach to vocabulary management
  • Maintains model functionality while significantly improving defense against harmful prompts
  • Reduces dependency on extensive data collection or auxiliary models

This research provides security teams with a practical, implementable defense mechanism that can be deployed to protect AI systems without compromising performance or requiring substantial resources.

LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution

145 | 157