LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits

About

Low-Rank Adaptation (LoRA) has become a popular technique for parameter-efficient fine-tuning of large language models (LLMs). In many real-world scenarios, multiple adapters are loaded simultaneously to enable LLM customization for personalized user experiences or to support a diverse range of tasks. Although each adapter is lightweight in isolation, their aggregate cost becomes substantial at scale. To address this, we propose LoRAQuant, a mixed-precision post-training quantization method tailored to LoRA. Specifically, LoRAQuant reparameterizes each adapter by singular value decomposition (SVD) to concentrate the most important information into specific rows and columns. This makes it possible to quantize the important components to higher precision, while quantizing the rest to ultra-low bitwidth. We conduct comprehensive experiments with LLaMA 2-7B, LLaMA 2-13B, and Mistral 7B models on mathematical reasoning, coding, and summarization tasks. Results show that our LoRAQuant uses significantly lower bits than other quantization methods, but achieves comparable or even higher performance.

Amir Reza Mirzaei, Yuqiao Wen, Yanshuai Cao, Lili Mou• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy (Acc)42.3	352
Mathematical Reasoning	Minerva Math	Accuracy8.94	124
Summarization	Xsum	ROUGE-L15.49	42
Mathematical Reasoning	Minerva Math	Minerva Math 4-shot Accuracy15.2	25
Mathematical Reasoning	GSM8K	GSM8K 8-shot Accuracy51.63	25
Mathematics	Minerva Math	4-shot Performance (%)8.94	21
Abstractive Summarization	Xsum	ROUGE-L18.27	10
Summarization	Xsum	ROUGE-L F118.27	9

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord