LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
About
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathematically equivalent to full fine-tuning using a low-rank gradient for parameter updates. And this low-rank gradient can be expressed in terms of the gradients of the two low-rank matrices in LoRA. Leveraging this insight, we introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of these low-rank matrices. This adjustment allows the low-rank gradient to more accurately approximate the full fine-tuning gradient, thereby narrowing the performance gap between LoRA and full fine-tuning. Furthermore, we theoretically derive the optimal solutions for adjusting the gradients of the low-rank matrices, applying them during fine-tuning in LoRA-Pro. We conduct extensive experiments across natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification tasks, demonstrating that LoRA-Pro substantially improves LoRA's performance, effectively narrowing the gap with full fine-tuning. Code is publicly available at https://github.com/mrflogs/LoRA-Pro.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText2 (val) | Perplexity (PPL)20.06 | 423 | |
| Mathematical Reasoning | GSM8K | Accuracy (Acc)75.7 | 337 | |
| Reasoning | ARC | Accuracy80.9 | 245 | |
| Common Sense Reasoning | BoolQ | Accuracy70.8 | 240 | |
| Commonsense Reasoning | Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) | BoolQ Accuracy69.6 | 223 | |
| Natural language generation | E2E (test) | ROUGE-L71.7 | 100 | |
| Data-to-text generation | DART (test) | BLEU44.9 | 64 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy84.5 | 58 | |
| Diffusion Fine-tuning | Mix-of-Show | CLIP Score31.47 | 16 | |
| Natural Language Understanding | GLUE base (test dev) | CoLA MCC71.36 | 11 |