Optimizing LLM Inference Systems

Optimizing LLM Inference Systems

Mathematical queuing models for maximizing LLM throughput

This research develops fundamental queuing theory for LLM inference systems to maximize throughput as demand for LLMs and AI agents grows.

  • Addresses a critical gap between queuing theory and LLM system engineering
  • Develops mathematical models specifically for LLM inference optimization
  • Evaluates against real-world systems including Orca, Sarathi-serve, and vLLM

As organizations deploy more LLM-powered applications, these throughput-optimal scheduling algorithms enable more efficient resource utilization and improved system performance at scale.

Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

493 | 521