HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks

About

The workflow of pretraining and fine-tuning has emerged as a popular paradigm for solving various NLP and V&L (Vision-and-Language) downstream tasks. With the capacity of pretrained models growing rapidly, how to perform parameter-efficient fine-tuning has become fairly important for quick transfer learning and deployment. In this paper, we design a novel unified parameter-efficient transfer learning framework that works effectively on both pure language and V&L tasks. In particular, we use a shared hypernetwork that takes trainable hyper-embeddings as input, and outputs weights for fine-tuning different small modules in a pretrained language model, such as tuning the parameters inserted into multi-head attention blocks (i.e., prefix-tuning) and feed-forward blocks (i.e., adapter-tuning). We define a set of embeddings (e.g., layer, block, task and visual embeddings) as the key components to calculate hyper-embeddings, which thus can support both pure language and V&L tasks. Our proposed framework adds fewer trainable parameters in multi-task learning while achieving superior performances and transfer ability compared to state-of-the-art methods. Empirical results on the GLUE benchmark and multiple V&L tasks confirm the effectiveness of our framework on both textual and visual modalities.

Zhengkun Zhang, Wenya Guo, Xiaojun Meng, Yasheng Wang, Yadao Wang, Xin Jiang, Qun Liu, Zhenglu Yang• 2022

Related benchmarks

Task	Dataset	Result
Sentiment Analysis	IMDB (test)	Accuracy90.5	306
Question Classification	TREC (test)	Accuracy97.2	128
Visual Question Answering	OKVQA (val)	VQA Score35.86	101
Question Answering	BoolQ (test)	Accuracy75.7	62
Natural Language Inference	CB SuperGLUE (test)	Accuracy91.43	33
Visual Entailment	SNLI-VE (test-p)	Accuracy65.67	24
Paraphrase Detection	PAWS original (test)	Accuracy91.79	23

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord