Benford's Law as a Distributional Prior for Post-Training Quantization of Large Language Models

About

The rapid growth of Large Language Models (LLMs) intensifies the need for effective compression, with weight quantization being the most widely adopted technique. Standard uniform quantizers assume that parameters are evenly distributed, an assumption at odds with the highly skewed distributions observed in practice. We propose Benford-Quant, a simple, data-free non-uniform quantizer inspired by Benford's Law, which predicts that leading digits follow a logarithmic distribution. Benford-Quant replaces the uniform grid with a log-spaced codebook, dedicating more resolution to the frequent small-magnitude weights. We provide both theoretical intuition and empirical evidence: (i) weights in transformer transformational layers adhere closely to Benford statistics, while normalization layers systematically deviate; (ii) on Small Language Models (SLMs), Benford-Quant consistently improves perplexity, reducing 4-bit perplexity on Gemma-270M by more than 10%; and (iii) on larger LLMs, it remains competitive, with differences explained by over-parameterization effects. Our results indicate that incorporating a Benford-inspired prior into quantization grids is a low-cost modification that yields accuracy gains in aggressive few-bit regimes. Although it is not able to surpass the state of the art in tasks such as perplexity and LAMBADA, the Benford-Quant approach can be hybridized with other quantization methods-such as SmoothQuant and Activation-Aware Quantization-without major pipeline modification, potentially improving their performance.

Arthur Negr\~ao, Pedro Silva, Vander L. S. Freitas, Gladston Moreira, Eduardo Luz• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL7.02	2333
Language Modeling	C4 (val)	PPL11.13	737
Language Modeling	LAMBADA zero-shot (test)	Accuracy (zero-shot)36.77	44

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord