Multilingual Safety for AI Assistants

Multilingual Safety for AI Assistants

Precision-targeting language-specific vulnerabilities in LLMs

Soteria introduces a lightweight approach to enhance LLM safety across multiple languages by targeting only the specific parameters responsible for harmful outputs in each language.

  • Identifies and adjusts only the functional heads most responsible for generating harmful content
  • Achieves significant safety improvements while modifying just a fraction of parameters
  • Maintains model performance across languages, even in low-resource settings
  • Introduces XThreatBench as a specialized benchmark for multilingual safety evaluation

This research addresses a critical security challenge for deploying LLMs globally, enabling organizations to implement targeted safety controls without compromising overall model utility or requiring complete retraining.

Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

6 | 20