Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence
About
Low-Rank Adaptation (LoRA) is the most widely adopted method for fine-tuning large language models. Notably, LoRA is inherently overparameterized: multiple pairs of low-rank factors can yield the same adapted weight matrix. We show--both theoretically and empirically--that these pairs exhibit significantly different condition numbers. As a result, converging to different loss minimizers directly impacts the convergence rate of LoRA. Building on this observation, we introduce Balanced Low-Rank Adaptation (BaLoRA), a variant of LoRA that projects iterates onto a balanced manifold. This manifold improves the conditioning of the loss landscape while preserving the adapted matrix. The projection step is computationally lightweight and integrates seamlessly into existing fine-tuning pipelines. Empirically, BaLoRA converges faster than standard LoRA and achieves superior performance across a range of fine-tuning tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | Wikitext-2 raw v1 | Loss2.251 | 10 | |
| Mathematical Reasoning | GSM8K (test) | Loss0.493 | 10 | |
| Fine-tuning | Alpaca | Evaluation Loss1.35 | 7 | |
| Fine-tuning | OpenOrca | Evaluation Loss0.773 | 7 | |
| Fine-tuning | CodeFeedback | Evaluation Loss0.638 | 7 | |
| Fine-tuning | WizardLM | Evaluation Loss0.662 | 7 | |
| Fine-tuning | OpenHermes | Evaluation Loss0.707 | 7 | |
| Mathematical Reasoning | MetaMathQA 100k-samples | Loss0.144 | 5 |