LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation

About

Post-training quantization (PTQ) enables effective model compression while preserving relatively high accuracy. Current weight-only PTQ methods primarily focus on the challenging sub-3-bit regime, where approaches often suffer significant accuracy degradation, typically requiring fine-tuning to achieve competitive performance. In this work, we revisit the fundamental characteristics of weight quantization and analyze the challenges in quantizing the residual matrix under low-rank approximation. We propose LoPRo, a novel fine-tuning-free PTQ algorithm that enhances residual matrix quantization by applying block-wise permutation and Walsh-Hadamard transformations to rotate columns of similar importance, while explicitly preserving the quantization accuracy of the most salient column blocks. Furthermore, we introduce a mixed-precision fast low-rank decomposition based on rank-1 sketch (R1SVD) to further minimize quantization costs. Experiments demonstrate that LoPRo outperforms existing fine-tuning-free PTQ methods at both 2-bit and 3-bit quantization, achieving accuracy comparable to fine-tuning baselines. Specifically, LoPRo achieves state-of-the-art quantization accuracy on LLaMA-2 and LLaMA-3 series models while delivering up to a 4$\times$ speedup. In the MoE model Mixtral-8x7B, LoPRo completes quantization within 2.5 hours, simultaneously reducing perplexity by 0.4$\downarrow$ and improving accuracy by 8\%$\uparrow$. Moreover, compared to other low-rank quantization methods, LoPRo achieves superior accuracy with a significantly lower rank, while maintaining high inference efficiency and minimal additional latency.

Hongyaoxing Gu, Lijuan Hu, Liye Yu, Haowei Li, Fangfang Liu• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText2	Perplexity4.15	3785
Language Modeling	WikiText-2 (test)	PPL4.13	2333
Question Answering	ARC Challenge	Accuracy (ARC)60.3	598
Zero-shot Question Answering and Reasoning	Accuracy Tasks Zero-shot (AC, AE, WI, QA)	AC Score61	52
Large Language Model Evaluation	Open LLM Leaderboard v1 (test)	Average Score66.1	34
Language Modeling	LLaMA-2 Family Evaluation v2 (test)	PPL4.8	10
Zero-shot Classification	Zero-shot Evaluation Suite (AC, AE, WI, QA) v1	AC Score46.2	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord