Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently

About

This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA, Hu et al. 2022) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately and applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of such algorithms. LoRA-One achieves significant empirical improvements over LoRA and its variants across benchmarks in natural language understanding, mathematical reasoning, and code generation. Code is available at: https://github.com/YuanheZ/LoRA-One.

Yuanhe Zhang, Fanghui Liu, Yudong Chen• 2025

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE
Accuracy87.2
1455
Multimodal EvaluationMME
Score1.38e+3
658
Visual Question AnsweringGQA
Accuracy60.1
505
Multi-discipline Multimodal UnderstandingMMMU
Accuracy35.9
317
Scientific Question AnsweringScienceQA image
Accuracy69.6
184
Math ReasoningMATH
Accuracy11.3
121
Math ReasoningGSM8K
Accuracy (GSM8K)65.5
49
Commonsense ReasoningCommonsense Reasoning Benchmark
BoolQ Accuracy72.8
22
Natural Language UnderstandingGLUE (val)
MNLI Accuracy85.49
6
Natural Language UnderstandingGLUE (test)
MNLI Accuracy85.03
6
Showing 10 of 10 rows

Other info

Follow for update