Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

About

We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming formulation of the quantization component which enables dynamic configuration of quantization parameters (e.g., bit-width, block size) for each matrix given an overall target memory budget. We further explore a data-aware version of the algorithm which uses an approximation of the Fisher information matrix to weight the reconstruction objective during matrix decomposition. Experiments on finetuning RoBERTa and LLaMA-2 (7B and 70B) demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines and enables aggressive quantization to sub-3 bits with only minor performance degradations. When finetuned on a language modeling calibration dataset, LQ-LoRA can also be used for model compression; in this setting our 2.75-bit LLaMA-2-70B model (which has 2.85 bits on average when including the low-rank components and requires 27GB of GPU memory) performs respectably compared to the 16-bit baseline.

Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim• 2023

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity5.67
2839
Question AnsweringARC Easy--
597
Natural Language InferenceRTE
Accuracy72.4
448
Commonsense ReasoningWinoGrande
Accuracy70.31
372
Question AnsweringARC Challenge
Accuracy (ARC)41.41
142
Natural Language InferenceaNLI
Accuracy34.38
65
Reading ComprehensionBoolQ
Accuracy (BoolQ)76.56
55
Physical Commonsense ReasoningPIQA
Accuracy75
45
Language ModelingLLaMA-2 13B
Perplexity (PPL)7.32
32
Aggregated Downstream EvaluationANLI, BoolQ, Winogrande, RTE, PiQA, ARC-Easy, ARC-Challenge
Average Accuracy60.81
8
Showing 10 of 14 rows

Other info

Follow for update