Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

About

Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA and Qwenin terms of perplexity (PPL) and downstream task performance.

Jinhao Zhang, Yunquan Zhang, Daning Chen, JunSun, Zicheng Yan• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL7.6
1541
Commonsense ReasoningHellaSwag
Accuracy77.13
1460
Multi-task Language UnderstandingMMLU
Accuracy65.87
842
Mathematical ReasoningGSM8K (test)
Accuracy68
797
Code GenerationHumanEval (test)--
444
Language ModelingC4 (val)
PPL12.72
392
Language ModelingWikiText2 (val)
Perplexity (PPL)6.89
277
Math ReasoningGSM8K
Accuracy74.33
126
Multi-task Language UnderstandingMMLU (test)
Normalized Accuracy60.2
76
Common Sense ReasoningHELLASWAG (test)
Accuracy74.5
45
Showing 10 of 10 rows

Other info

Follow for update