Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

About

Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.

Jonathan Pilault, Amine Elhattami, Christopher Pal• 2020

Related benchmarks

Task	Dataset	Result
Natural Language Inference	SNLI (test)	Accuracy92.1	694
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)94.5	529
Natural Language Understanding	GLUE (test)	SST-2 Accuracy96.3	416
Natural Language Inference	SNLI	Accuracy92.1	196
Natural Language Inference	SciTail (test)	Accuracy96.8	86
Named Entity Recognition	WNUT 2017 (test)	F1 Score58	63
Natural Language Understanding, Question Answering, and Named Entity Recognition	GLUE, SuperGLUE, MRQA, and WNUT2017 NER (24-task suite) v1 (dev)	GLUE Score89.4	6

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord