Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Balancing Average and Worst-case Accuracy in Multitask Learning

About

When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning. We highlight several failure cases of DRO when applied off-the-shelf and present an improved method, Lookahead-DRO (L-DRO), which mitigates these issues. The core idea of L-DRO is to anticipate the interaction between tasks during training in order to choose a dynamic re-weighting of the various task losses, which will (i) lead to minimal worst-case loss and (ii) train on as many tasks as possible. After demonstrating the efficacy of L-DRO on a small controlled synthetic setting, we evaluate it on two realistic benchmarks: a multitask version of the CIFAR-100 image classification dataset and a large-scale multilingual language modeling experiment. Our empirical results show that L-DRO achieves a better trade-off between average and worst-case accuracy with little computational overhead compared to several strong baselines.

Paul Michel, Sebastian Ruder, Dani Yogatama• 2021

Related benchmarks

TaskDatasetResultRank
Multimodal UnderstandingMMBench
Accuracy36.34
367
Multi-discipline Multimodal UnderstandingMMMU
Accuracy45.67
266
Science Question AnsweringScienceQA
Accuracy62.42
229
Visual Question AnsweringScienceQA
Accuracy83.44
210
Multimodal UnderstandingMMStar
Accuracy33.45
197
Visual Question AnsweringAI2D
Accuracy71.83
174
Optical Character Recognition BenchmarkingOCRBench
Accuracy56.5
109
Visual Question AnsweringRealworldQA
Accuracy57.91
98
Real-world Visual Question AnsweringRealworldQA
Accuracy46.27
91
Massive Multi-discipline Multimodal UnderstandingMMMU
Accuracy30
88
Showing 10 of 19 rows

Other info

Follow for update