Optimizing LLM Inference Systems

This research develops fundamental queuing theory for LLM inference systems to maximize throughput as demand for LLMs and AI agents grows.

Addresses a critical gap between queuing theory and LLM system engineering
Develops mathematical models specifically for LLM inference optimization
Evaluates against real-world systems including Orca, Sarathi-serve, and vLLM

As organizations deploy more LLM-powered applications, these throughput-optimal scheduling algorithms enable more efficient resource utilization and improved system performance at scale.

Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents