Building Trustworthy AI Systems

Building Trustworthy AI Systems

How LLMs Can Wisely Judge External Information

This research introduces methods to help large language models intelligently determine when to trust or question external information sources.

  • Situated Faithfulness: A framework for LLMs to dynamically assess trustworthiness of external contexts
  • Self-Consistency Regularization (SCR): Trains models to reason consistently about their internal knowledge
  • Retrieval-Consistency Regularization (RCR): Teaches models to evaluate reliability of retrieved information
  • Improved Security: Significantly enhances model resilience against misleading or manipulative information

These advances are crucial for secure AI deployment in high-stakes environments where misinformation poses serious risks.

To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts

36 | 141