Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Saliency-Aware Regularized Quantization Calibration for Large Language Models

About

Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, typically optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing PTQ calibration objectives based solely on empirical reconstruction error over limited or unrepresentative calibration data may move the quantized weights away from the original floating-point weights, potentially degrading downstream performance. To address this issue, we propose \emph{Regularized Quantization Calibration} (RQC), a unified framework that augments standard PTQ objectives with a regularizer that explicitly controls weight deviation from the original weights. We further generalize this framework to incorporate a saliency-aware regularizer, resulting in \emph{Saliency-Aware Regularized Quantization Calibration} (SARQC). The proposed regularization encourages quantized weights to remain close to the original weights during calibration, leading to improved generalization at inference time. SARQC integrates seamlessly into existing PTQ pipelines and enhances both scale-search-based and Gram-based methods under a unified formulation. Extensive experiments on dense and Mixture-of-Experts LLMs demonstrate consistent improvements in perplexity and zero-shot accuracy, without introducing additional inference overhead.

Yanlong Zhao, Xiaoyuan Cheng, Huihang Liu, Baihua He, Xinyu Zhang, Harrison Bo Hua Zhu, Wenlong Chen, Li Zeng, Zhuo Sun• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2
Perplexity (PPL)5.6
2320
Commonsense ReasoningWinoGrande--
1442
Commonsense ReasoningHellaSwag
HellaSwag Accuracy61.29
711
Question AnsweringARC-E
Accuracy80.6
523
Question AnsweringPIQA
Accuracy80.96
505
Multi-task Language UnderstandingMMLU
MMLU Accuracy74.27
442
Sentence CompletionHellaSwag
Accuracy63.17
364
Science Question AnsweringARC-C
Accuracy50
261
Question AnsweringARC-C
Accuracy51.02
258
Science Question AnsweringARC-E
Accuracy80.6
240
Showing 10 of 14 rows

Other info

Follow for update