Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
About
The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling the model in conflicting directions. We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. GradDrop is implemented as a simple deep layer that can be used in any deep net and synergizes with other gradient balancing approaches. We show that GradDrop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how GradDrop reveals links between optimal multiloss training and gradient stochasticity.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU75.27 | 1145 | |
| Depth Estimation | NYU v2 (test) | -- | 423 | |
| Semantic segmentation | NYU v2 (test) | mIoU39.39 | 248 | |
| Surface Normal Estimation | NYU v2 (test) | Mean Angle Distance (MAD)27.48 | 206 | |
| Semantic segmentation | NYU Depth V2 (test) | mIoU39.39 | 172 | |
| Multi-task Learning | Cityscapes (test) | MR5.5 | 43 | |
| Depth Estimation | Cityscapes (test) | Abs Err0.0157 | 40 | |
| Multi-task Learning | NYU v2 (test) | Delta m%3.58 | 31 | |
| Depth Estimation | Cityscapes | Abs. Err.0.0173 | 22 | |
| Multi-task Learning (Segmentation, Depth, Surface Normal) | NYU v2 (test) | mIoU39.39 | 14 |