Smarter Cache Sharing for LLMs

KVShare introduces a novel approach to sharing key-value caches across multiple users based on semantic similarity rather than exact text matching.

Enables fine-grained cache reuse between semantically similar but textually different queries
Balances efficient resource utilization with maintaining response diversity
Particularly valuable for domains with repetitive query patterns like education and customer support
Overcomes limitations of existing prefix and semantic caching methods

This engineering innovation significantly reduces computational overhead for LLM deployments while preserving response quality and uniqueness.

KVShare: Semantic-Aware Key-Value Cache Sharing for Efficient Large Language Model Inference