Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation

About

We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-D scans in indoor environments using a joint 3D-multi-view prediction network. In contrast to existing methods that either use geometry or RGB data as input for this task, we combine both data modalities in a joint, end-to-end network architecture. Rather than simply projecting color data into a volumetric grid and operating solely in 3D -- which would result in insufficient detail -- we first extract feature maps from associated RGB images. These features are then mapped into the volumetric feature grid of a 3D network using a differentiable backprojection layer. Since our target is 3D scanning scenarios with possibly many frames, we use a multi-view pooling approach in order to handle a varying number of RGB input views. This learned combination of RGB and geometric features with our joint 2D-3D architecture achieves significantly better results than existing baselines. For instance, our final result on the ScanNet 3D segmentation benchmark increases from 52.8\% to 75\% accuracy compared to existing volumetric architectures.

Angela Dai, Matthias Nie{\ss}ner• 2018

Related benchmarks

TaskDatasetResultRank
Semantic segmentationScanNet v2 (test)
mIoU49.8
248
3D Semantic SegmentationScanNet v2 (test)
mIoU48.4
110
3D Semantic SegmentationScanNet (test)
mIoU48.4
105
Semantic segmentationScanNet (test)
mIoU49.8
59
3D Semantic SegmentationScanNet20 v2 (test)
mIoU48.4
24
Semantic segmentationNYUv2 13-class labeling
Accuracy71.2
12
3D Semantic SegmentationMatterport3D (test)
Wall Accuracy79.6
12
3D Semantic SegmentationScanNet
Semantics mIoU49.22
11
2D Semantic SegmentationScanNet v2 (test)
mIoU49.8
10
2D Semantic SegmentationNYU2 11-class task
Mean Accuracy71.2
7
Showing 10 of 10 rows

Other info

Follow for update