Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

About

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate activations for the gradient calculation, akin to fine-tuning. One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed. Nevertheless, modifying a PLM to its reversible variant is not straightforward, since the reversible model has a distinct architecture from the currently released PLMs. In this paper, we first investigate what is a key factor for the success of existing PEFT methods, and realize that it's essential to preserve the PLM's starting point when initializing a PEFT method. With this finding, we propose memory-efficient fine-tuning (MEFT) that inserts adapters into a PLM, preserving the PLM's starting point and making it reversible without additional pre-training. We evaluate MEFT on the GLUE benchmark and five question-answering tasks with various backbones, BERT, RoBERTa, BART and OPT. MEFT significantly reduces the activation memory up to 84% of full fine-tuning with a negligible amount of trainable parameters. Moreover, MEFT achieves the same score on GLUE and a comparable score on the question-answering tasks as full fine-tuning. A similar finding is also observed for the image classification task.

Baohao Liao, Shaomu Tan, Christof Monz• 2023

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC Challenge
Accuracy34.1
749
Question AnsweringOpenBookQA
Accuracy37
465
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy96.8
416
Question AnsweringARC Easy
Normalized Acc65.7
385
Physical Interaction Question AnsweringPIQA
Accuracy77.4
323
Question AnsweringSciQ
Accuracy94.4
226
Natural Language UnderstandingGLUE excluding STS-B
Average Score88.4
4
Showing 7 of 7 rows

Other info

Code

Follow for update