Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

About

The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling the model in conflicting directions. We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. GradDrop is implemented as a simple deep layer that can be used in any deep net and synergizes with other gradient balancing approaches. We show that GradDrop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how GradDrop reveals links between optimal multiloss training and gradient stochasticity.

Zhao Chen, Jiquan Ngiam, Yanping Huang, Thang Luong, Henrik Kretzschmar, Yuning Chai, Dragomir Anguelov• 2020

Related benchmarks

Task	Dataset	Result
Semantic segmentation	Cityscapes (test)	mIoU75.27	1252
Semantic segmentation	Cityscapes	mIoU75.27	668
Depth Estimation	NYU v2 (test)	--	435
Semantic segmentation	NYU v2 (test)	mIoU39.39	282
Surface Normal Estimation	NYU v2 (test)	Mean Angle Distance (MAD)27.48	224
Semantic segmentation	NYU Depth V2 (test)	mIoU39.39	183
Surface Normal Prediction	NYU V2	Mean Error27.48	123
Depth Estimation	Cityscapes	Abs. Err.0.0157	65
Multi-task Learning	Cityscapes (test)	MR5.5	43
Depth Estimation	Cityscapes (test)	Abs Err0.0157	40

Showing 10 of 28 rows

Other info

Follow for update

@wizwand_team Discord