Achieving binary weight and activation for LLMs using Post-Training Quantization
About
Quantizing large language models (LLMs) to 1-bit precision significantly reduces computational costs, but existing quantization techniques suffer from noticeable performance degradation when using weight and activation precisions below 4 bits (W4A4). In this paper, we propose a post-training quantization framework with W(1+1)A(1*4) configuration, where weights are quantized to 1 bit with an additional 1 bit for fine-grain grouping and activations are quantized to 1 bit with a 4-fold increase in the number of channels. For weight quantization, we propose utilizing Hessian-aware fine-grained grouping along with an EM-based quantization scheme. For activation quantization, we decompose INT4-quantized activations into a 4 * INT1 format equivalently and simultaneously smooth the scaling factors based on quantization errors, which further reduces the quantization errors in activations. Our method surpasses state-of-the-art (SOTA) LLM quantization baselines on W2A4 across multiple tasks, pushing the boundaries of existing LLM quantization methods toward fully binarized models. Code is available at https://github.com/JimmyCrave/LLM-PTQ-binarization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText2 | Perplexity7.17 | 3785 | |
| Language Modeling | C4 | Perplexity10.18 | 1565 | |
| Language Modeling | PTB | Perplexity37.2 | 1234 | |
| Multiple-choice Question Answering | HellaSwag | Accuracy55.76 | 196 | |
| Language Understanding | MMLU (test) | MMLU Average Accuracy28 | 167 | |
| Question Answering | QA Suite Zero-shot (PIQA, ARC-E, ARC-C, BoolQ, HellaSwag, WinoGrande) | PIQA Accuracy72.09 | 141 | |
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity69.46 | 130 | |
| Commonsense Question Answering | WinoGrande | Accuracy58.01 | 73 | |
| Commonsense Question Answering | ARC-E | Accuracy46.13 | 29 | |
| Commonsense Question Answering | ARC Challenge | Accuracy30.55 | 21 |