Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

About

As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.

Damjan Kalajdzievski• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy94.33
1891
Code GenerationHumanEval
Pass@116.46
1036
Mathematical ReasoningGSM8K (test)
Accuracy45.62
770
Mathematical ReasoningMATH
Accuracy5.46
535
Code GenerationHumanEval (test)
Pass@116.01
506
Language ModelingWikiText2 (val)
Perplexity (PPL)19.62
387
Reading ComprehensionRACE high
Accuracy84.36
295
Reading ComprehensionRACE mid
Accuracy87.6
196
Natural Language UnderstandingGLUE (val)
SST-294.19
191
Code GenerationMBPP
Pass@135.72
159
Showing 10 of 18 rows

Other info

Follow for update