
Detecting Self-Destructive Content Across Languages
First bilingual benchmark for evaluating LLMs on harmful content detection
JiraiBench introduces the first comprehensive framework for evaluating how well large language models can detect self-destructive content in Chinese and Japanese social media communities.
- Focuses on the transnational "Jirai" online subculture where self-harm, drug overdose, and eating disorder content proliferates
- Provides a bilingual evaluation benchmark incorporating both linguistic and cultural dimensions
- Highlights LLMs' capabilities and limitations in cross-cultural harmful content detection
This research has critical applications for medical and mental health monitoring, enabling earlier intervention for at-risk individuals and improving content moderation systems across different cultural contexts.