Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Recurrent Scene Parsing with Perspective Understanding in the Loop

About

Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation.

Shu Kong, Charless Fowlkes• 2017

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes (test)
mIoU78.2
1145
Semantic segmentationCityscapes
mIoU75.4
578
Semantic segmentationCityscapes (val)
mIoU79.1
572
Semantic segmentationCityscapes (val)
mIoU79.1
287
Semantic segmentationSUN RGB-D (test)
mIoU45.1
191
Semantic segmentationNYUD v2 (test)
mIoU44.5
187
Semantic segmentationNYU Depth V2 (test)
mIoU46.5
172
Depth PredictionNYU Depth V2 (test)
Accuracy (δ < 1.25)81.6
113
Semantic segmentationNYUDv2 40-class (test)
mIoU44.5
99
Semantic segmentationSUN-RGBD (test)
mIoU45.1
77
Showing 10 of 18 rows

Other info

Code

Follow for update