BigMac: Optimizing Communication in LLM Architecture

BigMac introduces a communication-efficient structure for Mixture-of-Experts (MoE) models that addresses the All-to-All communication bottleneck in large language models.

Enhances the DeepSeekMoE fine-grained structure without performance degradation
Reduces communication overhead that typically limits scaling efficiency
Enables faster training and inference while maintaining model quality
Provides a practical engineering solution for more resource-efficient AI systems

This innovation matters for AI engineering by targeting a critical infrastructure limitation in distributed training of large models, potentially allowing more efficient development of next-generation language models with existing hardware resources.

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference