Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

About

A major recent advance in quantization is given by microscaled 4-bit formats such as NVFP4 and MXFP4, quantizing values into small groups sharing a scale, assuming a fixed floating-point grid. In this paper, we study the following natural extension: assume that, for each group of values, we are free to select the "better" among two or more 4-bit grids marked by one or more bits in the scale value. We formalize the power-of-two-grids (PO2) problem, and provide theoretical results showing that practical small-group formats such as MXFP or NVFP can benefit significantly from PO2 grids, while the advantage vanishes for very large groups. On the practical side, we instantiate several grid families, including 1) PO2(NF4), which pairs the standard NF4 normal grid with a learned grid, 2) MPO2, a grid pair that is fully learned over real weights and activations, 3) PO2(Split87), an explicit-zero asymmetric grid and 4) SFP4, a TensorCore-implementable triple which pairs NVFP4 with two shifted variants. Results for post-training quantization of standard open models and pre-training of Llama-like models show that adaptive grids consistently improve accuracy vs single-grid FP4 under both weight-only and weight+activation. Source code is available at https://github.com/IST-DASLab/GridGames.

Vage Egiazarian, Erik Schultheis, Andrei Panferov, Earl Killian, Torsten Hoefler, Dan Alistarh• 2026

Related benchmarks

TaskDatasetResultRank
Language Model EvaluationWinogrande, ARC-C, ARC-E, Lambada, PIQA, Hellaswag, MMLU, IFEval, and GSM8K-CoT (Mixed standard 10-shot prompt)
Accuracy80.3
88
Quantization Distribution EvaluationC4 (calibration set)
KL Divergence (Top 10)0.0364
11
Quantization Distribution EvaluationWiki2 (calibration set)
KL Divergence0.0813
11
Showing 3 of 3 rows

Other info

Follow for update