Optimizing LLMs for Edge Computing

This comprehensive survey examines how to effectively deploy large language models (LLMs) on edge devices despite computational and memory limitations.

Key findings:

Edge LLMs require unique resource-efficient model designs to function on devices with limited processing power
Pre-deployment strategies and runtime inference optimizations are critical for practical implementation
Solutions must address hardware heterogeneity across diverse edge environments

This research is significant for engineering teams developing AI applications for smartphones, IoT devices, and other edge computing scenarios where on-device AI processing provides privacy, latency, and connectivity advantages.

A Review on Edge Large Language Models: Design, Execution, and Applications