Accelerating LLMs with Hybrid Processing

Accelerating LLMs with Hybrid Processing

80x faster, 70% more energy-efficient 1-bit LLMs

PIM-LLM combines analog processing-in-memory and digital systolic arrays to dramatically accelerate 1-bit large language models.

  • Achieves ~80x improvement in tokens per second
  • Delivers 70% increase in tokens per joule (energy efficiency)
  • Optimizes low-precision matrix operations in projection layers
  • Handles high-precision operations in attention heads

This architecture represents a significant breakthrough for deploying efficient LLMs in resource-constrained environments, enabling faster inference with lower power consumption for edge computing and mobile applications.

PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs

31 | 46