Smart Routing for On-Device AI

Smart Routing for On-Device AI

Optimizing LLM Performance Through Uncertainty-Based Decision Making

This research introduces an innovative uncertainty-based routing system that strategically offloads complex queries from smaller on-device language models to more powerful cloud LLMs, balancing efficiency with accuracy.

  • Enables efficient on-device AI while maintaining high-quality responses
  • Leverages uncertainty metrics to identify when small models lack confidence
  • Demonstrates improved performance across various tasks including reasoning and knowledge-intensive queries
  • Provides a framework that generalizes well to new domains and unseen tasks

From a security perspective, this approach ensures critical or high-stakes queries receive appropriate handling, reducing the risk of unreliable AI responses in sensitive contexts while preserving device efficiency.

Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization

8 | 52