Safety Across Languages: The Hidden Gap in LLM Alignment

This research reveals critical gaps in how LLM safety alignment, primarily developed in English, performs across multiple languages.

Current alignment tuning methods developed in English do not generalize equally to all languages
Researchers identified a distinct "safety space" within LLMs that constrains their outputs differently per language
Non-English languages often receive weaker safety constraints, creating security vulnerabilities
Findings suggest alignment methods need language-specific approaches rather than assuming English-based safety transfers universally

For security teams, this research highlights the importance of testing LLM safety in all deployment languages rather than assuming English safety evaluations are sufficient.

The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context