Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

About

Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA and Qwenin terms of perplexity (PPL) and downstream task performance.

Jinhao Zhang, Yunquan Zhang, Daning Chen, JunSun, Zicheng Yan• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL7.6
1949
Commonsense ReasoningHellaSwag
Accuracy77.13
1891
Mathematical ReasoningGSM8K (test)
Accuracy68
900
Multi-task Language UnderstandingMMLU
Accuracy65.87
876
Language ModelingC4 (val)
PPL12.72
514
Code GenerationHumanEval (test)--
506
Language ModelingWikiText2 (val)
Perplexity (PPL)6.89
387
Math ReasoningGSM8K
Accuracy74.33
126
Multi-task Language UnderstandingMMLU (test)
Normalized Accuracy60.2
76
Common Sense ReasoningHELLASWAG (test)
Accuracy74.5
45
Showing 10 of 10 rows

Other info

Follow for update