Exposing the Vulnerabilities of Chinese LLMs

JailBench introduces the first comprehensive security assessment benchmark specifically designed to evaluate Chinese LLMs against sophisticated jailbreak attacks.

Employs diverse attack strategies including role-playing, context manipulation, encoding tricks, and instruction following vulnerabilities
Evaluates models across multiple risk categories including illegal content, discrimination, privacy violations, and harmful information
Reveals significant security gaps in current Chinese LLMs, with open-source models showing higher vulnerability than proprietary ones
Demonstrates that translation-based testing often fails to detect vulnerabilities specific to Chinese linguistic features

This research is crucial for developing more secure AI systems in Chinese markets by identifying and addressing language-specific security vulnerabilities before deployment.

JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models