Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Quantization without Tears

About

Deep neural networks, while achieving remarkable success across diverse tasks, demand significant resources, including computation, GPU memory, bandwidth, storage, and energy. Network quantization, as a standard compression and acceleration technique, reduces storage costs and enables potential inference acceleration by discretizing network weights and activations into a finite set of integer values. However, current quantization methods are often complex and sensitive, requiring extensive task-specific hyperparameters, where even a single misconfiguration can impair model performance, limiting generality across different models and tasks. In this paper, we propose Quantization without Tears (QwT), a method that simultaneously achieves quantization speed, accuracy, simplicity, and generality. The key insight of QwT is to incorporate a lightweight additional structure into the quantized network to mitigate information loss during quantization. This structure consists solely of a small set of linear layers, keeping the method simple and efficient. More importantly, it provides a closed-form solution, allowing us to improve accuracy effortlessly under 2 minutes. Extensive experiments across various vision, language, and multimodal tasks demonstrate that QwT is both highly effective and versatile. In fact, our approach offers a robust solution for network quantization that combines simplicity, accuracy, and adaptability, which provides new insights for the design of novel quantization paradigms. The code is publicly available at https://github.com/wujx2001/QwT

Minghao Fu, Hao Yu, Jie Shao, Junjie Zhou, Ke Zhu, Jianxin Wu• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity6.63
1875
Image ClassificationImageNet (val)
Top-1 Acc84
1206
Language ModelingC4
Perplexity9.38
1182
Image GenerationImageNet 256x256 (val)
FID5.35
307
Instance SegmentationCOCO
APmask45
279
Object DetectionCOCO
AP (Box)51.8
144
Zero-shot Image ClassificationImageNet zero-shot
Top-1 Accuracy63
35
Commonsense Question AnsweringCommonsense QA 8 datasets
Average QA Score65.18
3
Showing 8 of 8 rows

Other info

Code

Follow for update