Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Muppet: Massive Multi-task Representations with Pre-Finetuning

About

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta• 2021

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy86.4
1460
Natural Language InferenceRTE
Accuracy39.44
367
Physical Interaction Question AnsweringPIQA
Accuracy55.47
323
Boolean Question AnsweringBoolQ
Accuracy74.27
307
Question AnsweringOBQA
Accuracy39.47
276
Question AnsweringBoolQ
Accuracy82.17
240
Question ClassificationTREC
Accuracy96.8
205
Topic ClassificationAG-News
Accuracy89.77
173
Natural Language UnderstandingGLUE (val)
SST-297.4
170
Common Sense ReasoningWinoGrande
Accuracy55.49
156
Showing 10 of 77 rows
...

Other info

Code

Follow for update