Smarter Cache Sharing for LLMs

KVShare introduces a novel approach to share Key-Value caches across multiple users based on semantic similarity rather than exact text matching.

Enables fine-grained reuse of KV caches between different but semantically similar queries
Overcomes limitations of traditional prefix caching while maintaining response diversity
Achieves significant inference efficiency improvements for LLMs and MLLMs
Particularly valuable for applications with repetitive query patterns like education and customer support

This engineering innovation directly addresses the computational bottleneck in LLM deployment, making real-time AI assistants more scalable and cost-effective.

KVShare: Semantic-Aware Key-Value Cache Sharing for Efficient Large Language Model Inference