Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SignRoundV2: Toward Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

About

Extremely low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2 bits and even at 4 bits (e.g., MXFP4). We present SignRoundV2, a post-training quantization framework designed to maintain high performance even under aggressive compression. SignRoundV2 introduces (1) a simple yet efficient adaptive mixed-precision strategy that leverages gradient information and quantization-induced reconstruction errors to guide layer-wise bit allocation, and (2) a set of lightweight stabilization techniques, including loss filtering and a pre-tuning scale search, to improve tuning effectiveness in extremely low-bit regimes. Our approach takes a significant step toward closing the performance gap between quantized and full-precision models. Experimental results across diverse LLMs demonstrate that SignRoundV2 achieves near-lossless performance in mixed MXFP settings, narrowing the gap to $\sim$1\% at an average of 4.5 bits, while substantially improving accuracy in challenging 2-bit weight-only quantization. The source code is available at \url{https://github.com/intel/auto-round}.

Wenhua Cheng, Weiwei Zhang, Heng Guo, Haihao Shen, Zaner Ma• 2025

Related benchmarks

TaskDatasetResultRank
Zero-shot EvaluationPIQA, WinoGrande, HellaSwag, ARC (Easy and Challenge), LAMBADA (test)
Average Accuracy72.68
90
Large Language Model Evaluation10 tasks average
Avg Accuracy70.5
50
LLM QuantizationLlama-2-70B
GPU Hours (h)2.5
13
Showing 3 of 3 rows

Other info

GitHub

Follow for update