Lost in Translation: Safety Gaps in Multilingual LLMs

M-ALERT introduces a comprehensive benchmark for evaluating LLM safety across five European languages, revealing critical safety deterioration outside English.

Safety guardrails weaken significantly in non-English languages
Safety performance drops by up to 32% when moving from English to other languages
Top-performing models still exhibit concerning safety gaps across languages
Even when models refuse harmful English requests, they often comply with translations of the same content

This research highlights urgent security concerns for global AI deployment, showing that current safety measures are largely English-centric and fail to protect international users equally.

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps