Efficiently Identifying Task Groupings for Multi-Task Learning
About
Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from training together remains a challenging design question without a clear solution. In this paper, we suggest an approach to select which tasks should train together in multi-task learning models. Our method determines task groupings in a single run by training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss. On the large-scale Taskonomy computer vision dataset, we find this method can decrease test loss by 10.0% compared to simply training all tasks together while operating 11.6 times faster than a state-of-the-art task grouping method.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-task Learning | Ridership | Total Loss17.5 | 20 | |
| Multi-task Learning | ETTm1 | Total Loss3.9 | 20 | |
| Multi-task Learning | Chemical | Total Loss4.69 | 20 | |
| Multi-task Learning | CelebA | Total Loss49.67 | 20 | |
| Graph Algorithmic Reasoning | CLRS (test) | BFS Accuracy1 | 14 | |
| Natural Language Understanding | Multi-task NLP Suite (BoolQ, CB, COPA, H-SWAG, MultiRC, Story Cloze, Winogrande) Llama-3-8B base (test) | BoolQ Accuracy88.5 | 6 | |
| Pairwise MTL affinity prediction | CelebA 2015b (test) | Pearson Correlation0.16 | 3 | |
| Pairwise MTL affinity prediction | ETTm1 2021 (test) | Pearson Correlation0.43 | 3 | |
| Pairwise MTL affinity prediction | Chemical 2008 (test) | Pearson Correlation0.33 | 3 | |
| Pairwise MTL affinity prediction | Ridership 2023 (test) | Pearson Correlation Coefficient0.1 | 3 |