Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

About

Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.

Jonathan Pilault, Amine Elhattami, Christopher Pal• 2020

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceSNLI (test)
Accuracy92.1
681
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)94.5
504
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy96.3
416
Natural Language InferenceSNLI
Accuracy92.1
174
Natural Language InferenceSciTail (test)
Accuracy96.8
86
Named Entity RecognitionWNUT 2017 (test)
F1 Score58
63
Natural Language Understanding, Question Answering, and Named Entity RecognitionGLUE, SuperGLUE, MRQA, and WNUT2017 NER (24-task suite) v1 (dev)
GLUE Score89.4
6
Showing 7 of 7 rows

Other info

Code

Follow for update