The Deception Risk in LLM Mixtures

This research presents the first comprehensive analysis of how deceptive agents can compromise Mixture of LLMs (MoA) architectures—systems that leverage multiple AI models working together.

A single malicious LLM can significantly degrade performance across the entire system
Defense mechanisms (model voting, distillation) provide only partial protection
Vulnerabilities exist even in leading collaborative AI architectures
Deceptive information propagates through the system, affecting final outputs

Why This Matters: As organizations deploy multi-model AI solutions, understanding these security vulnerabilities becomes critical for building robust, trustworthy systems that can resist manipulation attempts.

This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs