
Breaking Language Barriers in Content Moderation
Adapting LLMs for Low-Resource Languages: The Sinhala Case Study
This research addresses the critical gap in offensive language detection capabilities between high and low-resource languages by introducing novel adaptation strategies for Sinhala.
Key Innovations:
- Introduction of four new models including Subasa-XLM-R with intermediate pre-finetuning
- Successful adaptation of language models for a low-resourced language
- Novel fine-tuning techniques specifically optimized for Sinhala
- Enhanced detection capabilities for offensive content moderation
Business Impact: These advancements enable more effective content moderation and social media safety systems in previously underserved linguistic communities, expanding the reach of security applications across language barriers.
Subasa - Adapting Language Models for Low-resourced Offensive Language Detection in Sinhala