DoRA: Weight-Decomposed Low-Rank Adaptation

About

Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. \ours~consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code is available at https://github.com/NVlabs/DoRA.

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen• 2024

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy37.44	1896
Visual Question Answering	VQA v2	Accuracy65.8	1429
Mathematical Reasoning	GSM8K	Accuracy76.7	1398
Code Generation	HumanEval	Pass@119.75	1043
Mathematical Reasoning	GSM8K (test)	Accuracy81.2	954
Question Answering	ARC Challenge	Accuracy50.8	906
Multi-task Language Understanding	MMLU	Accuracy11.67	881
Mathematical Reasoning	GSM8K (test)	Accuracy75.66	816
Image Classification	CIFAR-100 (val)	Accuracy94.11	781
Commonsense Reasoning	PIQA	Accuracy82.7	757

Showing 10 of 315 rows

...

Other info

Code

Follow for update

@wizwand_team Discord