Engineering Reliable LLM Accelerators

ReaLM introduces a novel statistical approach to fault tolerance in LLM hardware accelerators, dramatically reducing overhead while maintaining reliability.

Analyzes the inherent fault tolerance of LLMs to determine which computational steps are most vulnerable to hardware faults
Implements selective protection using algorithm-based fault tolerance (ABFT) only on critical operations
Achieves 99.8% fault detection rate while reducing overhead by 62.3% compared to conventional methods
Demonstrates 1.47× speedup and energy savings of 30.1% for LLM inference

This research enables more efficient and reliable LLM deployment in resource-constrained environments, addressing a critical challenge for widespread AI application in embedded systems.

ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance