LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning
About
Fine-tuning large language models (LLMs) is crucial for improving their performance on downstream tasks, but full-parameter fine-tuning (Full-FT) is computationally expensive and memory-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address this by optimizing only a small subset of parameters. However, LoRA may underperform Full-FT in certain scenarios due to the intrinsic limitations of its low-rank gradients. In this work, we reveal an asymmetric, collapsible structure in LoRA's update: the low-rank modification to W can be reformulated as a single-layer linear regression, implying that one of the LoRA factors can be frozen without sacrificing expressivity. Leveraging this insight, we introduce LoRA-FA, which freezes the projection-down matrix A and trains only the projection-up matrix B. We further close the gap to Full-FT by deriving closed-form gradient corrections that minimize the discrepancy between the induced low-rank gradient and the full gradient. Through extensive experiments on diverse benchmarks, including GLUE, GSM8K, MT-Bench, and HumanEval, we demonstrate that LoRA-FA consistently achieves comparable performance to existing PEFT methods and Full-FT. Experiments on system efficiency show that LoRA-FA significantly reduces activation memory consumption and computational workload in fine-tuning. Our code is available at https://github.com/huggingface/peft.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@115.91 | 1043 | |
| Commonsense Reasoning | PIQA | Accuracy75.97 | 757 | |
| Natural Language Understanding | GLUE | SST-293.65 | 551 | |
| Reading Comprehension | RACE high | Accuracy79.03 | 295 | |
| Image Classification | VTAB 1K | Overall Mean Accuracy68.2 | 281 | |
| Common Sense Reasoning | HellaSwag | Accuracy89.16 | 213 | |
| Reading Comprehension | RACE mid | Accuracy82.79 | 196 | |
| Common Sense Reasoning | WinoGrande | Accuracy0.8216 | 189 | |
| Mathematical Reasoning | GSM8K (val) | Accuracy40.25 | 108 | |
| Code Generation | MBPP | Pass@1 Accuracy20.01 | 59 |