Evaluating Health Information in Chinese LLMs

Evaluating Health Information in Chinese LLMs

First comprehensive benchmark for safety assessment in Chinese healthcare AI

CHBench introduces the first safety-focused benchmark to evaluate Large Language Models' performance on Chinese health-related inquiries, addressing a critical gap in LLM assessment.

  • Comprehensively assesses physical and mental health capabilities in Chinese LLMs
  • Identifies potential risks of medical misinformation in AI responses
  • Provides a standardized framework for measuring healthcare safety in Chinese language models
  • Emphasizes the real-world consequences of inaccurate health information in AI systems

This research is vital for the responsible deployment of AI in healthcare contexts where incorrect information could lead to serious patient harm, particularly in Chinese-speaking regions where specialized evaluation tools have been lacking.

CHBench: A Chinese Dataset for Evaluating Health in Large Language Models

15 | 85