Breaking Memory Barriers for Mobile LLMs

Breaking Memory Barriers for Mobile LLMs

Enabling larger language models on memory-constrained devices

ActiveFlow is a breakthrough framework that enables running larger language models on mobile devices by intelligently managing memory between DRAM and flash storage.

  • Introduces active-weight swapping technique to overcome device DRAM limitations
  • Achieves adaptive memory usage for modern transformer-based LLMs
  • Implements cross-layer weight preloading for optimized performance
  • Demonstrates practical scaling of deployable model sizes on standard smartphones

This engineering innovation addresses a critical bottleneck in mobile AI, allowing on-device inference of more powerful language models while maintaining acceptable performance—key for privacy-preserving applications and offline functionality.

Original Paper: Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash

49 | 52