Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

About

We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via pruning and/or quantization with speedup, and well-performing solutions have been proposed independently for both compression approaches. In this paper, we introduce a new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods. At the technical level, our approach is based on an exact and efficient realization of the classical Optimal Brain Surgeon (OBS) framework of [LeCun, Denker, and Solla, 1990] extended to also cover weight quantization at the scale of modern DNNs. From the practical perspective, our experimental results show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods, and that it can enable the accurate compound application of both pruning and quantization in a post-training setting.

Elias Frantar, Sidak Pal Singh, Dan Alistarh• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1k (val)
Top-1 Accuracy75.72
512
Image ClassificationImageNet
Top-1 Accuracy75.64
324
Image ClassificationImageNet (val)
Accuracy75.2
300
Object DetectionCOCO
AP50 (Box)66.14
190
Image ClassificationImageNet (val)
Top-1 Accuracy75.5
118
Question AnsweringSQuAD v1.1
F187.81
79
Question AnsweringSQuAD v1.1 (val)
F1 Score86.97
70
Image ClassificationImageNet-1k (val)
Accuracy71.47
5
Showing 8 of 8 rows

Other info

Code

Follow for update