Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

About

Depth estimation and scene parsing are two particularly important tasks in visual scene understanding. In this paper we tackle the problem of simultaneous depth estimation and scene parsing in a joint CNN. The task can be typically treated as a deep multi-task learning problem [42]. Different from previous methods directly optimizing multiple tasks given the input training data, this paper proposes a novel multi-task guided prediction-and-distillation network (PAD-Net), which first predicts a set of intermediate auxiliary tasks ranging from low level to high level, and then the predictions from these intermediate auxiliary tasks are utilized as multi-modal input via our proposed multi-modal distillation modules for the final tasks. During the joint learning, the intermediate tasks not only act as supervision for learning more robust deep representations but also provide rich multi-modal information for improving the final tasks. Extensive experiments are conducted on two challenging datasets (i.e. NYUD-v2 and Cityscapes) for both the depth estimation and scene parsing tasks, demonstrating the effectiveness of the proposed approach.

Dan Xu, Wanli Ouyang, Xiaogang Wang, Nicu Sebe• 2018

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes (test)
mIoU80.3
1154
Depth EstimationNYU v2 (test)
Threshold Accuracy (delta < 1.25)81.7
432
Semantic segmentationPASCAL Context (val)
mIoU53.6
360
Semantic segmentationCityscapes (val)
mIoU76.1
297
Semantic segmentationNYU v2 (test)
mIoU50.2
282
Surface Normal EstimationNYU v2 (test)
Mean Angle Distance (MAD)20.85
224
Depth EstimationNYU Depth V2
RMSE0.582
209
Semantic segmentationNYUD v2 (test)
mIoU36.61
187
Semantic segmentationNYU Depth V2 (test)
mIoU50.2
183
Semantic segmentationNYUD v2
mIoU50.2
125
Showing 10 of 46 rows

Other info

Follow for update