3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation

About

We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-D scans in indoor environments using a joint 3D-multi-view prediction network. In contrast to existing methods that either use geometry or RGB data as input for this task, we combine both data modalities in a joint, end-to-end network architecture. Rather than simply projecting color data into a volumetric grid and operating solely in 3D -- which would result in insufficient detail -- we first extract feature maps from associated RGB images. These features are then mapped into the volumetric feature grid of a 3D network using a differentiable backprojection layer. Since our target is 3D scanning scenarios with possibly many frames, we use a multi-view pooling approach in order to handle a varying number of RGB input views. This learned combination of RGB and geometric features with our joint 2D-3D architecture achieves significantly better results than existing baselines. For instance, our final result on the ScanNet 3D segmentation benchmark increases from 52.8\% to 75\% accuracy compared to existing volumetric architectures.

Angela Dai, Matthias Nie{\ss}ner• 2018

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ScanNet v2 (test)	mIoU49.8	248
3D Semantic Segmentation	ScanNet (test)	mIoU48.4	117
3D Semantic Segmentation	ScanNet v2 (test)	mIoU48.4	110
Semantic segmentation	ScanNet (test)	mIoU49.8	64
3D Semantic Segmentation	Matterport3D (test)	--	32
3D Semantic Segmentation	ScanNet20 v2 (test)	mIoU48.4	24
3D Semantic Segmentation	ScanNet	Semantics mIoU49.22	19
Semantic segmentation	NYUv2 13-class labeling	Accuracy71.2	12
2D Semantic Segmentation	ScanNet v2 (test)	mIoU49.8	10
2D Semantic Segmentation	NYU2 11-class task	Mean Accuracy71.2	7

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord