The Confidence Gap in LLMs

This research evaluates how well large language models (LLMs) calibrate their confidence levels, revealing significant overconfidence issues that pose risks in high-stakes applications.

Key findings:

LLMs demonstrate widespread overconfidence across different tasks and question types
Distractors significantly impact model confidence and calibration
Model size affects confidence calibration, with implications for model selection
Multiple-choice formats can reveal miscalibration patterns not evident in other evaluation methods

Why it matters for Security: Improper confidence calibration creates security risks in high-stakes scenarios where model outputs inform critical decisions. Understanding these patterns helps develop more reliable AI systems with appropriate uncertainty expression.

Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models