DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

About

We propose DepR, a depth-guided single-view scene reconstruction framework that integrates instance-level diffusion within a compositional paradigm. Instead of reconstructing the entire scene holistically, DepR generates individual objects and subsequently composes them into a coherent 3D layout. Unlike previous methods that use depth solely for object layout estimation during inference and therefore fail to fully exploit its rich geometric information, DepR leverages depth throughout both training and inference. Specifically, we introduce depth-guided conditioning to effectively encode shape priors into diffusion models. During inference, depth further guides DDIM sampling and layout optimization, enhancing alignment between the reconstruction and the input image. Despite being trained on limited synthetic data, DepR achieves state-of-the-art performance and demonstrates strong generalization in single-view scene reconstruction, as shown through evaluations on both synthetic and real-world datasets.

Qingcheng Zhao, Xiang Zhang, Haiyang Xu, Zeyuan Chen, Jianwen Xie, Yuan Gao, Zhuowen Tu• 2025

Related benchmarks

Task	Dataset	Result
3D Scene Generation	3D-Front (test)	CD (Surface)0.104	28
3D Scene Reconstruction	ScanNet Matterport3D Pix3D	Runtime (s)1.2	9
3D Scene Reconstruction	3D-FRONT	F Value3.20e+5	9
Scene Reconstruction	3D-FRONT	CD0.1532	8
Object Reconstruction	3D-FRONT	Chamfer Distance (CD)0.0026	7
Object Pose Accuracy	3D-FRONT	Box IoU36.67	7
Single-image scene generation	3D-Front (test)	Misorientation Rate11.83	6
3D Scene Reconstruction	CGTrader synthetic unseen and varied (test)	Chamfer Distance (CD)0.028	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord