CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization
About
Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA and Qwenin terms of perplexity (PPL) and downstream task performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL7.6 | 1541 | |
| Commonsense Reasoning | HellaSwag | Accuracy77.13 | 1460 | |
| Multi-task Language Understanding | MMLU | Accuracy65.87 | 842 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy68 | 797 | |
| Code Generation | HumanEval (test) | -- | 444 | |
| Language Modeling | C4 (val) | PPL12.72 | 392 | |
| Language Modeling | WikiText2 (val) | Perplexity (PPL)6.89 | 277 | |
| Math Reasoning | GSM8K | Accuracy74.33 | 126 | |
| Multi-task Language Understanding | MMLU (test) | Normalized Accuracy60.2 | 76 | |
| Common Sense Reasoning | HELLASWAG (test) | Accuracy74.5 | 45 |