Language Confusion in LLMs

Language Confusion in LLMs

New metrics reveal critical security vulnerabilities in multilingual LLM responses

This research introduces a quantitative metric for measuring language confusion in Large Language Models—when models unexpectedly generate text in inappropriate languages.

  • LLMs exhibit systematic patterns of language confusion following linguistic typology relationships
  • This vulnerability can be exploited through embedding inversion attacks
  • Models show varying confusion levels: GPT-3.5 (19%), Claude (15%), PaLM-2 (9%)
  • Cross-family language confusion (e.g., Germanic to Romance) presents heightened security risks

Language confusion isn't just an amusing glitch—it represents a significant security vulnerability that adversaries could exploit to manipulate model responses or bypass content filters in multilingual settings.

Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis

34 | 141