
LLMs vs. Humans in Clinical Decision-Making
Evaluating AI models' performance with medical calculators
This study assesses how large language models compare to human clinicians when using medical calculators for clinical decision support.
- Evaluated 9 different LLMs (open-source, proprietary, and domain-specific) on 1,009 questions across 35 clinical calculators
- Compared LLM performance with human clinicians on a subset of questions
- OpenAI's o1 model emerged as the highest-performing LLM in this clinical context
- Research provides insights into how AI might support or augment medical decision-making processes
This research matters because it moves beyond theoretical assessments of LLMs on medical licensing exams to evaluate their practical utility in real clinical scenarios, potentially transforming how clinicians make critical decisions at the point of care.
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators