
Language Confusion in LLMs
New metrics reveal critical security vulnerabilities in multilingual LLM responses
This research introduces a quantitative metric for measuring language confusion in Large Language Models—when models unexpectedly generate text in inappropriate languages.
- LLMs exhibit systematic patterns of language confusion following linguistic typology relationships
- This vulnerability can be exploited through embedding inversion attacks
- Models show varying confusion levels: GPT-3.5 (19%), Claude (15%), PaLM-2 (9%)
- Cross-family language confusion (e.g., Germanic to Romance) presents heightened security risks
Language confusion isn't just an amusing glitch—it represents a significant security vulnerability that adversaries could exploit to manipulate model responses or bypass content filters in multilingual settings.