Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

About

Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method, which proposes to freeze pretrained model weights and update an additive low-rank trainable matrix. In this work, we study the enhancement of LoRA training by introducing an $r \times r$ preconditioner in each gradient step where $r$ is the LoRA rank. We theoretically verify that the proposed preconditioner stabilizes feature learning with LoRA under infinite-width NN setting. Empirically, the implementation of this new preconditioner requires a small change to existing optimizer code and creates virtually minuscule storage and runtime overhead. Our experimental results with both large language models and text-to-image diffusion models show that with this new preconditioner, the convergence and reliability of SGD and AdamW can be significantly enhanced. Moreover, the training process becomes much more robust to hyperparameter choices such as learning rate. The new preconditioner can be derived from a novel Riemannian metric in low-rank matrix field. Code can be accessed at https://github.com/pilancilab/Riemannian_Preconditioned_LoRA.

Fangzhao Zhang, Mert Pilanci• 2024

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC-E
Accuracy88.01
523
Question AnsweringOBQA
Accuracy82.6
347
Mathematical ReasoningGSM8K
Accuracy (Acc)74.2
337
ReasoningARC
Accuracy85.3
245
Common Sense ReasoningBoolQ
Accuracy71.47
240
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)
BoolQ Accuracy70.7
223
Natural Language UnderstandingGLUE (val)
SST-297.25
201
Multiple-choice Question AnsweringHellaSwag
Accuracy92.61
196
Social Interaction Question AnsweringSIQA
Accuracy79.89
157
Natural language generationE2E (test)
ROUGE-L71.8
100
Showing 10 of 20 rows

Other info

Follow for update