Hybrid LLM Systems for Faster, Smarter Inference

This research introduces a task-oriented Age of Information (AoI) framework that intelligently combines large and small language models to optimize remote inference systems.

Addresses the trade-off between inference speed and accuracy by developing threshold-based policies
Uses a Semi-Markov Decision Process to determine optimal model selection based on data freshness
Demonstrates significant improvements in timeliness and accuracy compared to single-model approaches
Provides a practical engineering solution for systems where both speed and accuracy are critical

This innovation has important implications for Engineering, offering an optimized architecture for remote AI systems that need to balance computational resource constraints with inference quality requirements.

Original Paper: Task-oriented Age of Information for Remote Inference with Hybrid Language Models