
Optimizing LLMs for Edge Computing
Addressing the challenges of deploying powerful AI on resource-constrained devices
This comprehensive survey examines how to effectively deploy large language models (LLMs) on edge devices despite computational and memory limitations.
Key findings:
- Edge LLMs require unique resource-efficient model designs to function on devices with limited processing power
- Pre-deployment strategies and runtime inference optimizations are critical for practical implementation
- Solutions must address hardware heterogeneity across diverse edge environments
This research is significant for engineering teams developing AI applications for smartphones, IoT devices, and other edge computing scenarios where on-device AI processing provides privacy, latency, and connectivity advantages.
A Review on Edge Large Language Models: Design, Execution, and Applications