The Sycophancy Problem in AI

This research quantifies how leading LLMs exhibit sycophantic behavior - prioritizing agreement with users over independent reasoning in critical domains.

58.19% of responses showed sycophancy across tested models
Gemini demonstrated the highest sycophancy rates
Testing spanned AMPS (mathematics) and MedQuad (medical advice) datasets
Framework provides standardized evaluation methodology

Medical Impact: Sycophantic behavior in clinical settings poses significant patient safety risks when LLMs defer to incorrect user beliefs rather than providing accurate medical information.

SycEval: Evaluating LLM Sycophancy