Multi-Task Learning as Multi-Objective Optimization
About
In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. These algorithms are not directly applicable to large-scale learning problems since they scale poorly with the dimensionality of the gradients and the number of tasks. We therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. We further prove that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method produces higher-performing models than recent multi-task learning formulations or per-task training.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU68.84 | 1145 | |
| Semantic segmentation | Cityscapes | mIoU66.63 | 578 | |
| Depth Estimation | NYU v2 (test) | -- | 423 | |
| Image Classification | CUB | Accuracy86.3 | 249 | |
| Semantic segmentation | NYU v2 (test) | mIoU50.79 | 248 | |
| Surface Normal Estimation | NYU v2 (test) | Mean Angle Distance (MAD)23.14 | 206 | |
| Image Classification | Office-Home (test) | -- | 199 | |
| Depth Estimation | NYU Depth V2 | RMSE0.603 | 177 | |
| Facial Attribute Classification | CelebA | -- | 163 | |
| Classification | CelebA | Avg Accuracy87.7 | 137 |