Making LLMs Safe in All Languages

This research addresses the critical gap in safety alignment for non-standard English varieties and low-resource languages, using Singlish as a case study.

Compared three alignment methods (SFT, DPO, and KTO) for reducing toxic responses
Found that SFT+KTO combination achieved superior results in reducing harmful outputs
Demonstrated effectiveness of targeted safety alignment for languages beyond standard English
Proposed a generalizable framework for other low-resource languages

This work matters for security as it helps ensure AI systems remain safe when deployed in diverse linguistic contexts, reducing potential harm to underrepresented language communities.

Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study