Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

About

Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared "slow" weights and "fast" rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms standard fine-tuning on SuperGLUE and low-resource settings. Our code is publicly available at~\url{https://github.com/rabeehk/compacter}.

Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder• 2021

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)92.3
504
Natural Language UnderstandingGLUE
SST-296.3
452
Natural Language UnderstandingGLUE (val)--
170
Visual Question AnsweringVQA (test-dev)
Acc (All)66.5
147
Visual Question AnsweringVQA 2.0 (val)
Accuracy (Overall)40.44
143
Visual Question AnsweringVQA v2 (val)--
99
Natural Language UnderstandingSuperGLUE
SGLUE Score72.74
84
Text Classification20 Newsgroups (test)
Accuracy83.2
71
Multi-Task AdaptationPascal Context (test)
Seg Acc76.33
70
Saliency DetectionPascal Context (test)--
57
Showing 10 of 25 rows

Other info

Follow for update