Building Trustworthy AI Systems

This research introduces methods to help large language models intelligently determine when to trust or question external information sources.

Situated Faithfulness: A framework for LLMs to dynamically assess trustworthiness of external contexts
Self-Consistency Regularization (SCR): Trains models to reason consistently about their internal knowledge
Retrieval-Consistency Regularization (RCR): Teaches models to evaluate reliability of retrieved information
Improved Security: Significantly enhances model resilience against misleading or manipulative information

These advances are crucial for secure AI deployment in high-stakes environments where misinformation poses serious risks.