
DRAGON: Boosting Small LMs on Edge Devices
A distributed framework for efficient retrieval-augmented generation
DRAGON framework enables small language models to perform better on edge devices by intelligently retrieving knowledge from both cloud and local sources.
- Distributed Retrieval: Efficiently accesses both cloud databases and private on-device documents
- Speculative Aggregation: Reduces latency by optimizing how knowledge is retrieved and combined
- Privacy-Preserving: Maintains security of sensitive user data while leveraging cloud resources
- Resource-Efficient: Enables small LMs to approach larger model performance without intensive retraining
This innovation bridges the gap between edge computing constraints and AI performance demands, making advanced language capabilities feasible on resource-limited devices.
Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance