DiSCo: Optimizing LLM Text Streaming

DiSCo: Optimizing LLM Text Streaming

A device-server collaborative approach for better performance and lower costs

DiSCo introduces a hybrid architecture that intelligently distributes LLM processing between user devices and cloud servers to optimize text streaming services.

  • Addresses critical QoE metrics: Time-To-First-Token (TTFT) and Time-Between-Token (TBT)
  • Combines on-device small models for faster initial responses with server models for higher quality completions
  • Achieves up to 56% lower latency while reducing operational costs
  • Dynamically adapts to network conditions and query complexity

This research offers engineering teams a practical framework to balance performance and cost constraints in LLM deployments, especially for applications requiring real-time interactions at scale.

DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services

272 | 521