Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation

About

Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences. In particular, we propose three key contributions: (1) We transfer knowledge from features learned during self-supervised depth estimation to semantic segmentation, (2) we implement a strong data augmentation by blending images and labels using the geometry of the scene, and (3) we utilize the depth feature diversity as well as the level of difficulty of learning depth in a student-teacher framework to select the most useful samples to be annotated for semantic segmentation. We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains, and we achieve state-of-the-art results for semi-supervised semantic segmentation. The implementation is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.

Lukas Hoyer, Dengxin Dai, Yuhua Chen, Adrian K\"oring, Suman Saha, Luc Van Gool• 2020

Related benchmarks

Task	Dataset	Result
Semantic segmentation	Cityscapes (test)	mIoU75	1252
Semantic segmentation	Cityscapes (val)	mIoU75	527
Depth Estimation	NYU v2 (test)	--	435
Surface Normal Estimation	NYU v2 (test)	--	224
Semantic segmentation	NYUD v2 (test)	mIoU40.28	187
Semantic segmentation	NYU Depth V2 (test)	mIoU39.47	183
Depth Prediction	Cityscapes (test)	RMSE6.528	52
Multi-task Learning	Cityscapes (test)	MR41.84	43
Edge Detection	NYUD v2 (test)	--	16
Depth Estimation	Cityscapes standard (val)	RMSE6.528	11

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord