Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

About

Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components: 1. room layout with camera pose; 2. 3D object bounding boxes; 3. object meshes. We argue that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction. The experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods in indoor layout estimation, 3D object detection and mesh reconstruction.

Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, Jian Jun Zhang• 2020

Related benchmarks

Task	Dataset	Result
3D Object Detection	SUN RGB-D (test)	--	64
Scene Generation	MIDI (test)	CD-S27	18
3D Object Detection	SUN RGB-D v1 (test)	Bed AP60.65	18
3D Layout Estimation	SUN RGB-D	IoU59.2	14
Object Reconstruction	Synthetic dataset	CD-O18.02	9
Camera pose estimation	SUN RGB-D	Pitch3.15	9
Reconstruction	Replica	Depth L10.78	9
3D Shape Reconstruction	Pix3D (test)	F-Score36.2	9
3D Layout Estimation	SUN RGB-D v1 (test)	Average IoU59.2	8
3D Object Detection	Structured3D v1 (test)	Bed AP54.16	8

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord