LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

About

Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. However, current LoRA optimizers lack transformation invariance, meaning the actual updates to the weights depends on how the two LoRA factors are scaled or rotated. This deficiency leads to inefficient learning and sub-optimal solutions in practice. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization, which can achieve transformation invariance and remain computationally efficient. We provide theoretical analysis to demonstrate the benefit of our method and conduct experiments on various LLM tasks with different models including Gemma 2B, 7B, and mT5-XXL. The results demonstrate consistent improvements against existing optimizers. For example, replacing Adam with LoRA-RITE during LoRA fine-tuning of Gemma-2B yielded 4.6\% accuracy gain on Super-Natural Instructions and 3.5\% accuracy gain across other four LLM benchmarks (HellaSwag, ArcChallenge, GSM8K, OpenBookQA).

Jui-Nan Yen, Si Si, Zhao Meng, Felix Yu, Sai Surya Duvvuri, Inderjit S. Dhillon, Cho-Jui Hsieh, Sanjiv Kumar• 2024

Related benchmarks

Task	Dataset	Result
Question Answering	ARC-E	Accuracy88.26	523
Question Answering	OBQA	Accuracy83.4	347
Common Sense Reasoning	BoolQ	Accuracy74.19	240
Multiple-choice Question Answering	HellaSwag	Accuracy93.21	196
Social Interaction Question Answering	SIQA	Accuracy80.25	157
Mathematical Reasoning	Math10K	SVAMP73.6	51
Question Answering	ARC-C	Accuracy71.5	28
Natural Language Understanding	GLUE8 (val)	CoLA Score0.515	18
Natural Language Understanding	GLUE base (test dev)	CoLA MCC69.55	11
Fine-tuning	OpenHermes	Evaluation Loss0.707	7

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord