A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

About

As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.

Damjan Kalajdzievski• 2023

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy94.33	1896
Code Generation	HumanEval	Pass@116.46	1043
Mathematical Reasoning	GSM8K (test)	Accuracy45.62	816
Code Generation	HumanEval (test)	Pass@116.01	612
Mathematical Reasoning	MATH	Accuracy5.46	535
Language Modeling	WikiText2 (val)	Perplexity (PPL)19.62	423
Reading Comprehension	RACE high	Accuracy84.36	295
Arithmetic Reasoning	GSM8K	Accuracy76.84	272
Common Sense Reasoning	BoolQ	Accuracy69.81	240
Commonsense Reasoning	ARC-C	Accuracy74.18	215

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord