Engineering Reliable LLM Accelerators

Engineering Reliable LLM Accelerators

Statistical fault tolerance without compromising performance

ReaLM introduces a novel statistical approach to fault tolerance in LLM hardware accelerators, dramatically reducing overhead while maintaining reliability.

  • Analyzes the inherent fault tolerance of LLMs to determine which computational steps are most vulnerable to hardware faults
  • Implements selective protection using algorithm-based fault tolerance (ABFT) only on critical operations
  • Achieves 99.8% fault detection rate while reducing overhead by 62.3% compared to conventional methods
  • Demonstrates 1.47× speedup and energy savings of 30.1% for LLM inference

This research enables more efficient and reliable LLM deployment in resource-constrained environments, addressing a critical challenge for widespread AI application in embedded systems.

ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance

29 | 46