RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates

About

We propose RoCoFT, a parameter-efficient fine-tuning method for large-scale language models (LMs) based on updating only a few rows and columns of the weight matrices in transformers. Through extensive experiments with medium-size LMs like BERT and RoBERTa, and larger LMs like Bloom-7B, Llama2-7B, and Llama2-13B, we show that our method gives comparable or better accuracies than state-of-art PEFT methods while also being more memory and computation-efficient. We also study the reason behind the effectiveness of our method with tools from neural tangent kernel theory. We empirically demonstrate that our kernel, constructed using a restricted set of row and column parameters, are numerically close to the full-parameter kernel and gives comparable classification performance. Ablation studies are conducted to investigate the impact of different algorithmic choices, including the selection strategy for rows and columns as well as the optimal rank for effective implementation of our method.

Md Kowsher, Tara Esmaeilbeig, Chun-Nam Yu, Chen Chen, Mojtaba Soltanalian, Niloofar Yousefi• 2024

Related benchmarks

Task	Dataset	Result
Natural Language Understanding	GLUE	SST-296.69	551
Natural Language Understanding	GLUE (test)	SST-2 Accuracy96.6	416
Commonsense Reasoning	Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)	BoolQ Accuracy71.46	223
Question Answering	SQuAD 2.0	F185.14	215
Summarization	Xsum	ROUGE-218.54	108
Question Answering	SQuAD v1.1	F188.15	85
Summarization	CNN Daily Mail	ROUGE-140.83	67
Mathematical Reasoning	Mathematical Reasoning Suite MathQA, GSM8K, AddSub, SingleEq, SVAMP	MathQA Accuracy91.46	24

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord