Optimizing LLMs for Edge Computing

This research addresses the challenge of deploying powerful Large Language Models (LLMs) to resource-constrained edge devices in smart cities while maintaining performance.

Proposes DILEMMA framework for joint optimization of LLM quantization and distributed inference
Tackles the critical tradeoff between model performance and edge resource limitations
Enables smarter distribution of LLM components across edge networks
Reduces latency for end users while preserving model accuracy

For Engineering teams, this approach opens new possibilities for deploying advanced AI capabilities closer to users, significantly reducing response times and network load for smart applications.

DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems