Saliency-Aware Regularized Quantization Calibration for Large Language Models

About

Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, typically optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing PTQ calibration objectives based solely on empirical reconstruction error over limited or unrepresentative calibration data may move the quantized weights away from the original floating-point weights, potentially degrading downstream performance. To address this issue, we propose \emph{Regularized Quantization Calibration} (RQC), a unified framework that augments standard PTQ objectives with a regularizer that explicitly controls weight deviation from the original weights. We further generalize this framework to incorporate a saliency-aware regularizer, resulting in \emph{Saliency-Aware Regularized Quantization Calibration} (SARQC). The proposed regularization encourages quantized weights to remain close to the original weights during calibration, leading to improved generalization at inference time. SARQC integrates seamlessly into existing PTQ pipelines and enhances both scale-search-based and Gram-based methods under a unified formulation. Extensive experiments on dense and Mixture-of-Experts LLMs demonstrate consistent improvements in perplexity and zero-shot accuracy, without introducing additional inference overhead.

Yanlong Zhao, Xiaoyuan Cheng, Huihang Liu, Baihua He, Xinyu Zhang, Harrison Bo Hua Zhu, Wenlong Chen, Li Zeng, Zhuo Sun• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)5.6	2862
Commonsense Reasoning	WinoGrande	--	1581
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy61.29	897
Question Answering	PIQA	Accuracy80.96	589
Question Answering	ARC-E	Accuracy80.6	544
Multi-task Language Understanding	MMLU	MMLU Accuracy74.27	456
Sentence Completion	HellaSwag	Accuracy63.17	440
Question Answering	ARC-C	Accuracy51.02	283
Science Question Answering	ARC-C	Accuracy50	268
Reading Comprehension	BoolQ	Accuracy (BoolQ)82.6	258

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord