
Optimizing LLMs for Edge Computing
Balancing performance and resource constraints at the network edge
This research addresses the challenge of deploying powerful Large Language Models (LLMs) to resource-constrained edge devices in smart cities while maintaining performance.
- Proposes DILEMMA framework for joint optimization of LLM quantization and distributed inference
- Tackles the critical tradeoff between model performance and edge resource limitations
- Enables smarter distribution of LLM components across edge networks
- Reduces latency for end users while preserving model accuracy
For Engineering teams, this approach opens new possibilities for deploying advanced AI capabilities closer to users, significantly reducing response times and network load for smart applications.
DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems