Binary Neural Networks: Shrinking LLMs

This survey explores how binary neural networks can dramatically reduce the computational and memory requirements of large language models while maintaining acceptable performance.

Binary quantization can reduce model size by up to 32x compared to full-precision models
Current techniques include Binary Representation Learning and Binary Reasoning Enhancement
Challenges remain in preserving model capabilities during extreme compression
Binary LLMs offer potential for edge device deployment and resource-constrained environments

This engineering breakthrough enables more efficient model deployment, reduced inference costs, and broader accessibility of LLM technology across computing environments.

Binary Neural Networks for Large Language Model: A Survey