PACT: Parameterized Clipping Activation for Quantized Neural Networks
About
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL17.49 | 1541 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy76.5 | 1453 | |
| Image Classification | ImageNet (val) | Top-1 Acc76.5 | 1206 | |
| Language Modeling | WikiText-103 (test) | Perplexity16.76 | 524 | |
| Natural Language Understanding | GLUE | SST-289.45 | 452 | |
| Image Classification | ImageNet-1k (val) | Top-1 Acc69.2 | 287 | |
| Summarization | XSum (test) | ROUGE-216.6 | 231 | |
| Image Classification | ImageNet-1k (val) | Top-1 Acc61.4 | 188 | |
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity16.11 | 120 | |
| Image Classification | ImageNet (val) | Accuracy69.2 | 115 |