NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs

About

Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on low-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max formula. In this work, we identify the limitations of the Min-Max formula, move beyond its constraints, and propose NeUQI, a method that efficiently determines near-optimal initialization for uniform quantization. Our NeUQI simplifies the joint optimization of the scale and zero-point by deriving the zero-point for a given scale, thereby reducing the problem to a scale-only optimization. Benefiting from the improved quantization parameters, our NeUQI consistently outperforms existing methods in the experiments with the LLaMA and Qwen families on various settings and tasks. Furthermore, when combined with a lightweight distillation strategy, NeUQI even achieves superior performance to PV-tuning, a considerably more resource-intensive method.

Li Lin, Xinyu Hu, Xiaojun Wan• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)5.99	2862
Language Modeling	C4 (val)	PPL8.88	908
Language Modeling	C4	Perplexity7.57	482
Language Modeling	WikiText2 (val)	Perplexity (PPL)7.03	436
Language Modeling	WikiText-2	WikiText-2 Score4.99	86
Zero-shot Evaluation	Zero-shot Evaluation Suite (ARC-c, ARC-e, HellaSwag, PIQA, WinoGrande)	ARC-c Accuracy48.04	82
Zero-shot Classification	5 zero-shot tasks	Accuracy62.77	55
Zero-shot Evaluation	Evaluation Benchmarks Zero-shot	Average Accuracy56.77	55
Zero-shot Language Understanding	Zero-shot Benchmarks	Average Zero-shot Accuracy73.07	21
Language Modeling	C4	C4 Score11.72	12

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord