DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning
About
Multi-task learning has recently emerged as a promising solution for a comprehensive understanding of complex scenes. In addition to being memory-efficient, multi-task models, when appropriately designed, can facilitate the exchange of complementary signals across tasks. In this work, we jointly address 2D semantic segmentation and three geometry-related tasks: dense depth estimation, surface normal estimation, and edge estimation, demonstrating their benefits on both indoor and outdoor datasets. We propose a novel multi-task learning architecture that leverages pairwise cross-task exchange through correlation-guided attention and self-attention to enhance the overall representation learning for all tasks. We conduct extensive experiments across three multi-task setups, showing the advantages of our approach compared to competitive baselines in both synthetic and real-world benchmarks. Additionally, we extend our method to the novel multi-task unsupervised domain adaptation setting. Our code is available at https://github.com/cv-rits/DenseMTL
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | NYU v2 (test) | -- | 423 | |
| Surface Normal Estimation | NYU v2 (test) | -- | 206 | |
| Semantic segmentation | NYUD v2 (test) | mIoU40.84 | 187 | |
| Multi-task Learning | Cityscapes (test) | MR40.05 | 43 | |
| Edge Detection | NYUD v2 (test) | -- | 16 | |
| Semantic segmentation | SYNTHIA to Cityscapes 16 classes (test) | mIoU37.93 | 13 | |
| Multi-task Learning | Synthia (test) | mIoU82.99 | 10 | |
| Multi-task Learning | vKITTI 2 (test) | mIoU97.53 | 10 | |
| Monocular Depth Estimation | SYNTHIA to Cityscapes 16 classes UDA (val) | Root Mean Squared Error (RMSE)11.66 | 9 | |
| Multi-Task Learning Overall Improvement | NYUD v2 (test) | ΔSD (%)5.8 | 8 |