
Boosting LLM Efficiency Through Smart Resource Sharing
How HyGen intelligently co-locates online and offline workloads
HyGen is a novel LLM serving system that improves resource utilization by intelligently managing both latency-sensitive (online) and throughput-oriented (offline) workloads on the same infrastructure.
- Eliminates the inefficiency of dedicating separate machines to different workload types
- Uses interference-aware scheduling to maintain service-level objectives (SLOs)
- Implements elastic resource allocation between online and offline tasks
- Achieves better overall system utilization while preserving performance guarantees
This research offers significant value for Engineering teams managing LLM infrastructure, enabling more cost-effective deployment without sacrificing responsiveness for user-facing applications.
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location