Trustworthy AI Through Quotable LLMs

This research introduces Quote-Tuning, a novel approach where LLMs are designed to quote directly from their pre-training data, making verification straightforward and reliable.

Creates models that primarily generate content by quoting verbatim from trusted sources
Achieves 88.4% quote rate with competitive performance on standard benchmarks
Enables effortless verification without complex post-processing or retrieval systems
Represents a paradigm shift from post-hoc citation to verifiability as a core design feature

For security professionals, this approach addresses critical concerns about AI trustworthiness by making verification inherent to the model's operation rather than an external process, significantly reducing the risk of hallucinations and misinformation.

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data