
Trustworthy AI Through Quotable LLMs
Enhancing LLM verifiability by design rather than afterthought
This research introduces Quote-Tuning, a novel approach where LLMs are designed to quote directly from their pre-training data, making verification straightforward and reliable.
- Creates models that primarily generate content by quoting verbatim from trusted sources
- Achieves 88.4% quote rate with competitive performance on standard benchmarks
- Enables effortless verification without complex post-processing or retrieval systems
- Represents a paradigm shift from post-hoc citation to verifiability as a core design feature
For security professionals, this approach addresses critical concerns about AI trustworthiness by making verification inherent to the model's operation rather than an external process, significantly reducing the risk of hallucinations and misinformation.
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data