LLMs vs. Humans in Clinical Decision-Making

This study assesses how large language models compare to human clinicians when using medical calculators for clinical decision support.

Evaluated 9 different LLMs (open-source, proprietary, and domain-specific) on 1,009 questions across 35 clinical calculators
Compared LLM performance with human clinicians on a subset of questions
OpenAI's o1 model emerged as the highest-performing LLM in this clinical context
Research provides insights into how AI might support or augment medical decision-making processes

This research matters because it moves beyond theoretical assessments of LLMs on medical licensing exams to evaluate their practical utility in real clinical scenarios, potentially transforming how clinicians make critical decisions at the point of care.

Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators