
Breaking Memory Barriers for Mobile LLMs
Enabling larger language models on memory-constrained devices
ActiveFlow is a breakthrough framework that enables running larger language models on mobile devices by intelligently managing memory between DRAM and flash storage.
- Introduces active-weight swapping technique to overcome device DRAM limitations
- Achieves adaptive memory usage for modern transformer-based LLMs
- Implements cross-layer weight preloading for optimized performance
- Demonstrates practical scaling of deployable model sizes on standard smartphones
This engineering innovation addresses a critical bottleneck in mobile AI, allowing on-device inference of more powerful language models while maintaining acceptable performance—key for privacy-preserving applications and offline functionality.
Original Paper: Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash