Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer

About

Ensuring content safety in large language models (LLMs) is essential for their deployment in real-world applications. However, existing safety guardrails are predominantly tailored for high-resource languages, leaving a significant portion of the world's population underrepresented who communicate in low-resource languages. To address this, we introduce CREST (CRoss-lingual Efficient Safety Transfer), a parameter-efficient multilingual safety classification model that supports 100 languages with only 0.5B parameters. By training on a strategically chosen subset of only 13 high-resource languages, our model utilizes cluster-based cross-lingual transfer from a few to 100 languages, enabling effective generalization to both unseen high-resource and low-resource languages. This approach addresses the challenge of limited training data in low-resource settings. We conduct comprehensive evaluations across six safety benchmarks to demonstrate that CREST outperforms existing state-of-the-art guardrails of comparable scale and achieves competitive results against models with significantly larger parameter counts (2.5B parameters and above). Our findings highlight the limitations of language-specific guardrails and underscore the importance of developing universal, language-agnostic safety systems that can scale effectively to serve global populations.

Lavish Bansal, Naman Mishra• 2025

Related benchmarks

TaskDatasetResultRank
Safety ClassificationXSTest
F1 Score69.83
16
Safety ClassificationMultiJail
F1 Score0.9335
15
Safety ClassificationAya Redteaming--
14
SafetyCultural Kaleidoscope
F1 Score69.42
7
SafetyIndicSafe En
F1 Score84.89
7
Multilingual Safety Evaluation6 Safety Datasets High-Resource Languages
Safety Score (Fr)0.8606
5
Safety ClassificationRTP LX
F1 Score79.86
2
Safety ClassificationPTP
F1 Score81.28
2
Showing 8 of 8 rows

Other info

Follow for update