Speeding Up RAG Systems with Smart Caching

Proximity introduces an innovative approach to make Retrieval-Augmented Generation (RAG) faster by intelligently caching similar queries and their retrieved documents.

Reduces RAG inference time by up to 9.5x by leveraging approximate caching
Maintains answer quality while improving throughput by 2.7-3.4x
Requires no changes to existing RAG pipelines
Particularly effective for domain-specific applications (e.g., medical queries)

Why it matters for Medical: Healthcare applications need fast, accurate responses from LLMs. Proximity enables medical RAG systems to deliver answers more quickly while maintaining accuracy, essential for clinical decision support and medical information retrieval.

Leveraging Approximate Caching for Faster Retrieval-Augmented Generation