
Evaluating LLMs in Traditional Chinese Medicine
The first comprehensive benchmark for assessing AI models in TCM contexts
TCM-3CEval introduces a novel triaxial benchmark designed to evaluate large language models' capabilities in Traditional Chinese Medicine across three critical dimensions.
- Assesses core knowledge mastery, classical text understanding, and clinical decision-making in TCM
- Evaluates diverse models including global LLMs (GPT-4o), Chinese models (InternLM), and medical-specific models (PLUSE)
- Reveals a clear performance hierarchy among different model types
This research is significant for medical AI deployment as it addresses a critical gap in evaluating LLMs for traditional medicine contexts, potentially improving healthcare access and knowledge preservation in TCM practices.