In Defense of the Unitary Scalarization for Deep Multi-Task Learning
About
Recent multi-task learning research argues against unitary scalarization, where training simply minimizes the sum of the task losses. Several ad-hoc multi-task optimization algorithms have instead been proposed, inspired by various hypotheses about what makes multi-task settings difficult. The majority of these optimizers require per-task gradients, and introduce significant memory, runtime, and implementation overhead. We show that unitary scalarization, coupled with standard regularization and stabilization techniques from single-task learning, matches or improves upon the performance of complex multi-task optimizers in popular supervised and reinforcement learning settings. We then present an analysis suggesting that many specialized multi-task optimizers can be partly interpreted as forms of regularization, potentially explaining our surprising results. We believe our results call for a critical reevaluation of recent research in the area.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | NYU v2 (test) | -- | 423 | |
| Semantic segmentation | NYU v2 (test) | mIoU52.02 | 248 | |
| Surface Normal Estimation | NYU v2 (test) | Mean Angle Distance (MAD)23.79 | 206 | |
| Image Classification | Office-Home (test) | -- | 199 | |
| Multi-task Learning | NYU v2 (test) | -- | 31 | |
| Multi-task Learning | NYU V2 | mIoU53.77 | 19 | |
| Multi-Objective Learning | Office-31 | Amazon Accuracy0.8102 | 8 | |
| Image Classification | MNIST (test) | Cross-Entropy Loss306.9 | 3 |