Breaking Language Barriers in Content Moderation

Breaking Language Barriers in Content Moderation

Adapting LLMs for Low-Resource Languages: The Sinhala Case Study

This research addresses the critical gap in offensive language detection capabilities between high and low-resource languages by introducing novel adaptation strategies for Sinhala.

Key Innovations:

  • Introduction of four new models including Subasa-XLM-R with intermediate pre-finetuning
  • Successful adaptation of language models for a low-resourced language
  • Novel fine-tuning techniques specifically optimized for Sinhala
  • Enhanced detection capabilities for offensive content moderation

Business Impact: These advancements enable more effective content moderation and social media safety systems in previously underserved linguistic communities, expanding the reach of security applications across language barriers.

Subasa - Adapting Language Models for Low-resourced Offensive Language Detection in Sinhala

97 | 104