Detecting Self-Destructive Content Across Languages

Detecting Self-Destructive Content Across Languages

First bilingual benchmark for evaluating LLMs on harmful content detection

JiraiBench introduces the first comprehensive framework for evaluating how well large language models can detect self-destructive content in Chinese and Japanese social media communities.

  • Focuses on the transnational "Jirai" online subculture where self-harm, drug overdose, and eating disorder content proliferates
  • Provides a bilingual evaluation benchmark incorporating both linguistic and cultural dimensions
  • Highlights LLMs' capabilities and limitations in cross-cultural harmful content detection

This research has critical applications for medical and mental health monitoring, enabling earlier intervention for at-risk individuals and improving content moderation systems across different cultural contexts.

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community

90 | 113