DRAGON: Boosting Small LMs on Edge Devices

DRAGON framework enables small language models to perform better on edge devices by intelligently retrieving knowledge from both cloud and local sources.

Distributed Retrieval: Efficiently accesses both cloud databases and private on-device documents
Speculative Aggregation: Reduces latency by optimizing how knowledge is retrieved and combined
Privacy-Preserving: Maintains security of sensitive user data while leveraging cloud resources
Resource-Efficient: Enables small LMs to approach larger model performance without intensive retraining

This innovation bridges the gap between edge computing constraints and AI performance demands, making advanced language capabilities feasible on resource-limited devices.

Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance