Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

About

Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules uniformly and rely on a single numerical property when estimating sensitivity, overlooking their distinct operational roles and structural characteristics. To address this, we propose NSDS, a novel calibration-free LMPQ framework driven by Numerical and Structural Dual-Sensitivity. Specifically, it first mechanistically decomposes each layer into distinct operational roles and quantifies their sensitivity from both numerical and structural perspectives. These dual-aspect scores are then aggregated into a unified layer-wise metric through a robust aggregation scheme based on MAD-Sigmoid and Soft-OR to guide bit allocation. Extensive experiments demonstrate that NSDS consistently achieves superior performance compared to various baselines across diverse models and downstream tasks, without relying on any calibration data.

Hengyuan Zhang, Xinrong Chen, Zunhai Su, Xiao Liang, Jing Xiong, Wendong Xu, He Xiao, Chaofan Tao, Wei Zhang, Ruobing Xie, Lei Jiang, Hayden Kwok-Hay So, Ngai Wong• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingC4
Perplexity7.99
1071
Commonsense ReasoningPIQA
Accuracy75.25
751
Common Sense ReasoningHellaSwag
Accuracy77.34
213
Common Sense ReasoningBoolQ
Accuracy78.62
212
Common Sense ReasoningWinoGrande
Accuracy74.28
189
Language ModelingWikiText2
Perplexity6.23
162
ReasoningPIQA
Accuracy76.78
145
ReasoningARC-C
Accuracy58.27
80
Commonsense ReasoningTruthfulQA
Accuracy31.15
28
Language ModelingLanguage Modeling Average
PPL7.11
12
Showing 10 of 12 rows

Other info

Follow for update