Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs

About

Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on low-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max formula. In this work, we identify the limitations of the Min-Max formula, move beyond its constraints, and propose NeUQI, a method that efficiently determines near-optimal initialization for uniform quantization. Our NeUQI simplifies the joint optimization of the scale and zero-point by deriving the zero-point for a given scale, thereby reducing the problem to a scale-only optimization. Benefiting from the improved quantization parameters, our NeUQI consistently outperforms existing methods in the experiments with the LLaMA and Qwen families on various settings and tasks. Furthermore, when combined with a lightweight distillation strategy, NeUQI even achieves superior performance to PV-tuning, a considerably more resource-intensive method.

Li Lin, Xinyu Hu, Xiaojun Wan• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2
Perplexity (PPL)5.99
2320
Language ModelingC4 (val)
PPL8.88
737
Language ModelingWikiText2 (val)
Perplexity (PPL)7.03
423
Language ModelingC4
Perplexity7.57
72
Zero-shot Classification5 zero-shot tasks
Accuracy62.77
55
Zero-shot EvaluationZero-shot Evaluation Suite (ARC-c, ARC-e, HellaSwag, PIQA, WinoGrande)
ARC-c Accuracy48.04
52
Zero-shot EvaluationEvaluation Benchmarks Zero-shot
Average Accuracy56.77
34
Language ModelingWikiText-2
WikiText-2 Score4.99
32
Zero-shot Language UnderstandingZero-shot Benchmarks
Average Zero-shot Accuracy73.07
21
Language ModelingC4
C4 Score11.72
12
Showing 10 of 10 rows

Other info

Follow for update