Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Attention-based Multi-modal Fusion Network for Semantic Scene Completion

About

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

Siqi Li, Changqing Zou, Yipeng Li, Xibin Zhao, Yue Gao• 2020

Related benchmarks

TaskDatasetResultRank
Semantic Scene CompletionNYU v2 (test)
Ceiling Error13.7
72
Scene CompletionNYU v2 (test)
mIoU59
48
Semantic Scene CompletionSUNCG-RGBD (test)
Ceiling Accuracy81.3
13
Scene CompletionSUNCG-RGBD (test)
Precision60.6
12
Showing 4 of 4 rows

Other info

Follow for update