CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer

About

Ensuring content safety in large language models (LLMs) is essential for their deployment in real-world applications. However, existing safety guardrails are predominantly tailored for high-resource languages, leaving a significant portion of the world's population underrepresented who communicate in low-resource languages. To address this, we introduce CREST (CRoss-lingual Efficient Safety Transfer), a parameter-efficient multilingual safety classification model that supports 100 languages with only 0.5B parameters. By training on a strategically chosen subset of only 13 high-resource languages, our model utilizes cluster-based cross-lingual transfer from a few to 100 languages, enabling effective generalization to both unseen high-resource and low-resource languages. This approach addresses the challenge of limited training data in low-resource settings. We conduct comprehensive evaluations across six safety benchmarks to demonstrate that CREST outperforms existing state-of-the-art guardrails of comparable scale and achieves competitive results against models with significantly larger parameter counts (2.5B parameters and above). Our findings highlight the limitations of language-specific guardrails and underscore the importance of developing universal, language-agnostic safety systems that can scale effectively to serve global populations.

Lavish Bansal, Naman Mishra• 2025

Related benchmarks

Task	Dataset	Result
Safety Classification	XSTest	F1 Score69.83	16
Safety Classification	MultiJail	F1 Score0.9335	15
Safety Classification	Aya Redteaming	--	14
Safety	Cultural Kaleidoscope	F1 Score69.42	7
Safety	IndicSafe En	F1 Score84.89	7
Multilingual Safety Evaluation	6 Safety Datasets High-Resource Languages	Safety Score (Fr)0.8606	5
Safety Classification	RTP LX	F1 Score79.86	2
Safety Classification	PTP	F1 Score81.28	2

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord