The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
About
Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale, its inner workings are described as a sequence of algebraic updates that obscure geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: first, the GPTQ error propagation step gains an intuitive geometric interpretation; second, GPTQ inherits the error upper bound of Babai's algorithm under the assumption that no weights are clipped. Leveraging this bound, we design post-training quantization methods that avoid clipping, and outperform the original GPTQ. In addition, we provide efficient GPU inference kernels for the resulting representation. Taken together, these results place GPTQ on a firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models. Source code is available at https://github.com/IST-DASLab/GPTQ-Babai.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy60.98 | 1891 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)2 | 1624 | |
| Question Answering | ARC Challenge | -- | 906 | |
| Commonsense Reasoning | PIQA | Accuracy72.91 | 751 | |
| Question Answering | ARC Easy | Accuracy60.4 | 597 | |
| Physical Commonsense Reasoning | PIQA | Accuracy77.53 | 572 | |
| Language Modeling | C4 (val) | PPL14.64 | 514 | |
| Multitask Language Understanding | MMLU | Accuracy67.91 | 413 | |
| Language Modeling | WikiText2 (val) | Perplexity (PPL)11.27 | 387 | |
| Commonsense Reasoning | WinoGrande | Accuracy69.53 | 372 |