CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

About

Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA and Qwenin terms of perplexity (PPL) and downstream task performance.

Jinhao Zhang, Yunquan Zhang, Daning Chen, JunSun, Zicheng Yan• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL7.6	2333
Commonsense Reasoning	HellaSwag	Accuracy77.13	1896
Mathematical Reasoning	GSM8K (test)	Accuracy68	954
Multi-task Language Understanding	MMLU	Accuracy65.87	881
Language Modeling	C4 (val)	PPL12.72	737
Code Generation	HumanEval (test)	--	612
Language Modeling	WikiText2 (val)	Perplexity (PPL)6.89	423
Math Reasoning	GSM8K	Accuracy74.33	126
Multi-task Language Understanding	MMLU (test)	Normalized Accuracy60.2	87
Common Sense Reasoning	HELLASWAG (test)	Accuracy74.5	56

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord