Testing LLMs on Medical Exams: Spanish MIR Study

Testing LLMs on Medical Exams: Spanish MIR Study

Benchmarking 22 LLMs on clinical reasoning and diagnostic capabilities

This study evaluates how current LLMs perform on rigorous medical certification exams, measuring both knowledge recall and complex clinical problem solving.

Key findings:

  • 22 different LLMs were tested on the Spanish Medical Intern Resident (MIR) examination
  • The research assessed both text-based clinical reasoning and multimodal capabilities through image interpretation
  • Results reveal the current state of LLMs' medical expertise and clinical reasoning abilities
  • Provides insights into model limitations for healthcare applications

Business relevance: This benchmark helps healthcare organizations evaluate LLM reliability for medical applications, identifying which models demonstrate sufficient expertise for potential clinical support roles while highlighting areas requiring human oversight.

Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025: A Comparative Analysis of Clinical Reasoning and Knowledge Application

5 | 17