
Hidden Threats in AI Systems
Token-level backdoor attacks against multi-modal LLMs
This research introduces BadToken, a new, highly effective method for injecting backdoors into multi-modal large language models without being detected.
- Successfully attacks MLLMs in plug-and-play scenarios without fine-tuning
- Achieves up to 98.67% attack success rate while maintaining remarkable stealthiness
- Works by targeting specific tokens rather than entire inputs, making detection extremely difficult
- Demonstrates significant security vulnerabilities in models used for critical applications like autonomous driving and medical diagnosis
This research highlights urgent security concerns for organizations deploying MLLMs in production environments, especially for high-stakes applications where malicious outputs could have serious consequences.
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models