LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
About
The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning large language models (LLMs), however, it still requires expensive activation memory to update low-rank weights. Reducing the number of LoRA layers or using activation recomputation could harm the fine-tuning performance or increase the computational overhead. In this work, we present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. LoRA-FA chooses to freeze the projection-down weight of $A$ and update the projection-up weight of $B$ in each LoRA layer. It ensures the change of model weight reside in a low-rank space during LLMs fine-tuning, while eliminating the requirement to store full-rank input activations. We conduct extensive experiments across multiple model types (RoBERTa, T5, LLaMA) and model scales. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA. Furthermore, LoRA-FA can reduce the overall memory cost by up to 1.4$\times$ compared to LoRA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | Pass@115.91 | 1036 | |
| Commonsense Reasoning | PIQA | Accuracy75.97 | 751 | |
| Natural Language Understanding | GLUE | SST-293.65 | 531 | |
| Reading Comprehension | RACE high | Accuracy79.03 | 295 | |
| Common Sense Reasoning | HellaSwag | Accuracy89.16 | 213 | |
| Reading Comprehension | RACE mid | Accuracy82.79 | 196 | |
| Common Sense Reasoning | WinoGrande | Accuracy0.8216 | 189 | |
| Mathematical Reasoning | GSM8K (val) | Accuracy40.25 | 81 | |
| Code Generation | MBPP | Pass@1 Accuracy20.01 | 59 | |
| Mathematical Reasoning | MATH (val) | Accuracy5.66 | 48 |