
Evaluating Health Information in Chinese LLMs
First comprehensive benchmark for safety assessment in Chinese healthcare AI
CHBench introduces the first safety-focused benchmark to evaluate Large Language Models' performance on Chinese health-related inquiries, addressing a critical gap in LLM assessment.
- Comprehensively assesses physical and mental health capabilities in Chinese LLMs
- Identifies potential risks of medical misinformation in AI responses
- Provides a standardized framework for measuring healthcare safety in Chinese language models
- Emphasizes the real-world consequences of inaccurate health information in AI systems
This research is vital for the responsible deployment of AI in healthcare contexts where incorrect information could lead to serious patient harm, particularly in Chinese-speaking regions where specialized evaluation tools have been lacking.
CHBench: A Chinese Dataset for Evaluating Health in Large Language Models