
Building Trustworthy AI Systems
How LLMs Can Wisely Judge External Information
This research introduces methods to help large language models intelligently determine when to trust or question external information sources.
- Situated Faithfulness: A framework for LLMs to dynamically assess trustworthiness of external contexts
- Self-Consistency Regularization (SCR): Trains models to reason consistently about their internal knowledge
- Retrieval-Consistency Regularization (RCR): Teaches models to evaluate reliability of retrieved information
- Improved Security: Significantly enhances model resilience against misleading or manipulative information
These advances are crucial for secure AI deployment in high-stakes environments where misinformation poses serious risks.