Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

About

As large language models have grown larger, interest has grown in low-precision numerical formats such as NVFP4 as a way to improve speed and reduce memory usage. However, quantizing models to NVFP4 remains challenging as the lack of precision generally degrades model performance. In this work, we address this issue with Four Over Six (4/6), a modification to the block-scaled NVFP4 quantization algorithm that yields reduced quantization error. Unlike integer formats, floating point formats have non-uniform step sizes which create larger quantization error on larger values. 4/6 takes advantage of this by adaptively scaling some blocks to smaller FP4 values, making the distribution of representable values more uniform and reducing quantization error for near-maximal values. We show that 4/6 can be implemented efficiently on modern hardware accelerators, resulting in performance gains during both pre-training and inference with minimal computational overhead. In pre-training experiments with the Nemotron 3 Nano 30B-A3B model architecture, we find that 4/6 brings training loss closer to BF16 compared to models trained with current state-of-the-art NVFP4 training recipes. Our code is available at https://github.com/mit-han-lab/fouroversix.

Jack Cook, Junxian Guo, Guangxuan Xiao, Yujun Lin, Keith Wyss, Mahdi Nazemi, Asit Mishra, Carlo del Mundo, Tijmen Blankevoort, Song Han• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2
Perplexity (PPL)6.16
2320
Commonsense ReasoningHellaSwag
Accuracy73.7
1896
Question AnsweringBoolQ
Accuracy86.5
317
Zero-shot ReasoningReasoning Suite Zero-shot (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test)
Average Accuracy72.09
297
Multiple-choice Question AnsweringARC Easy
Accuracy80.8
257
Language Model EvaluationWinogrande, ARC-C, ARC-E, Lambada, PIQA, Hellaswag, MMLU, IFEval, and GSM8K-CoT (Mixed standard 10-shot prompt)
Accuracy80.25
88
Zero-shot Commonsense ReasoningARC-Easy, ARC-Challenge, HellaSwag, PIQA, WinoGrande lm-evaluation-harness (test)
ARC-e Accuracy81.57
43
Zero-shot Language UnderstandingARC-Easy, ARC-Challenge, HellaSwag, LAMBADA, PIQA lm-eval 0.4.11 (test)
Average Accuracy80.8
42
Language ModelingC4
Word Perplexity20.45
32
Feature Space PreservationWikiText-2
Cosine Similarity98.92
32
Showing 10 of 16 rows

Other info

Follow for update