Preserving Fairness and Safety in Quantized LLMs Through Critical Weight Protection

About

Quantization is widely adopted to reduce the computational cost of large language models (LLMs); however, its implications for fairness and safety, particularly in dynamic quantization and multilingual contexts, remain underexplored. In this work, we conduct a systematic study of how static and dynamic quantization methods impact fairness and safety across benchmarks measuring intrinsic and extrinsic bias and safety alignment. For fairness, we evaluate English, French, Dutch, Spanish, and Turkish; for safety, we focus on English, Korean, and Arabic. Our findings reveal that quantization consistently degrades fairness and safety, with dynamic methods demonstrating greater stability than static ones. Moreover, fairness degradation varies across languages, while safety deterioration is especially pronounced in non-English settings. To address these risks, we introduce Critical Weight Protection, a novel technique that identifies and preserves fairness- and safety-critical weights during quantization. This approach effectively mitigates bias and safety deterioration without costly retraining or alignment, maintaining trustworthiness while retaining efficiency.

Muhammad Alif Al Hakim, Alfan Farizki Wicaksono, Fajri Koto• 2026

Related benchmarks

Task	Dataset	Result
Safety Evaluation	HEX-PHI (test)	--	56
Bias Measurement	StereoSet	Overall SS50.215	25
Fairness evaluation	Jigsaw	BiasAUC75.6	9
Safety Evaluation	SafetyBench (test)	Accuracy81.321	9
Safety Evaluation	MultiJail-KO (test)	Safety Rate90.053	9
Safety Evaluation	MultiJail-AR (test)	Safety Rate (%)89.101	9
Fairness evaluation	CrowS-Pair En	Stereotype Score64.58	9
Safety Evaluation	Do-Not-Answer (test)	ASR3.301	9
Fairness evaluation	CrowS-Pair Fr	SS52.057	9
Safety Evaluation	MultiJail EN (test)	Safety Rate92.487	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord