COMET: Accelerating AI with Efficient MoE Communication

COMET introduces a fine-grained computation-communication overlapping system that significantly reduces communication overhead in Mixture-of-Experts (MoE) models.

Addresses a critical bottleneck where inter-device communication can consume 47% of execution time in large MoE models
Employs data dependency analysis and task rescheduling to optimize parallel processing
Achieves up to 1.76x speedup in MoE layer execution compared to state-of-the-art methods
Enables more efficient scaling of trillion-parameter language models without proportional increases in computational costs

This research is crucial for engineering teams building large-scale AI systems, as it provides a practical approach to mitigate communication overhead—one of the primary bottlenecks in distributed MoE deployment.

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts