
Optimizing LLM Inference Systems
Mathematical queuing models for maximizing LLM throughput
This research develops fundamental queuing theory for LLM inference systems to maximize throughput as demand for LLMs and AI agents grows.
- Addresses a critical gap between queuing theory and LLM system engineering
- Develops mathematical models specifically for LLM inference optimization
- Evaluates against real-world systems including Orca, Sarathi-serve, and vLLM
As organizations deploy more LLM-powered applications, these throughput-optimal scheduling algorithms enable more efficient resource utilization and improved system performance at scale.
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents