
The Deception Risk in LLM Mixtures
Exposing Vulnerabilities in Collaborative AI Systems
This research presents the first comprehensive analysis of how deceptive agents can compromise Mixture of LLMs (MoA) architectures—systems that leverage multiple AI models working together.
- A single malicious LLM can significantly degrade performance across the entire system
- Defense mechanisms (model voting, distillation) provide only partial protection
- Vulnerabilities exist even in leading collaborative AI architectures
- Deceptive information propagates through the system, affecting final outputs
Why This Matters: As organizations deploy multi-model AI solutions, understanding these security vulnerabilities becomes critical for building robust, trustworthy systems that can resist manipulation attempts.
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs