Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity
About
Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules uniformly and rely on a single numerical property when estimating sensitivity, overlooking their distinct operational roles and structural characteristics. To address this, we propose NSDS, a novel calibration-free LMPQ framework driven by Numerical and Structural Dual-Sensitivity. Specifically, it first mechanistically decomposes each layer into distinct operational roles and quantifies their sensitivity from both numerical and structural perspectives. These dual-aspect scores are then aggregated into a unified layer-wise metric through a robust aggregation scheme based on MAD-Sigmoid and Soft-OR to guide bit allocation. Extensive experiments demonstrate that NSDS consistently achieves superior performance compared to various baselines across diverse models and downstream tasks, without relying on any calibration data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | C4 | Perplexity7.99 | 1071 | |
| Commonsense Reasoning | PIQA | Accuracy75.25 | 751 | |
| Common Sense Reasoning | HellaSwag | Accuracy77.34 | 213 | |
| Common Sense Reasoning | BoolQ | Accuracy78.62 | 212 | |
| Common Sense Reasoning | WinoGrande | Accuracy74.28 | 189 | |
| Language Modeling | WikiText2 | Perplexity6.23 | 162 | |
| Reasoning | PIQA | Accuracy76.78 | 145 | |
| Reasoning | ARC-C | Accuracy58.27 | 80 | |
| Commonsense Reasoning | TruthfulQA | Accuracy31.15 | 28 | |
| Language Modeling | Language Modeling Average | PPL7.11 | 12 |