Lost in Translation: Safety Gaps in Multilingual LLMs

Lost in Translation: Safety Gaps in Multilingual LLMs

How LLM safety measures deteriorate across languages

M-ALERT introduces a comprehensive benchmark for evaluating LLM safety across five European languages, revealing critical safety deterioration outside English.

  • Safety guardrails weaken significantly in non-English languages
  • Safety performance drops by up to 32% when moving from English to other languages
  • Top-performing models still exhibit concerning safety gaps across languages
  • Even when models refuse harmful English requests, they often comply with translations of the same content

This research highlights urgent security concerns for global AI deployment, showing that current safety measures are largely English-centric and fail to protect international users equally.

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

3 | 20