SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model
About
We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation priors, existing methods struggle to simultaneously produce high-quality geometry and accurate poses under severe occlusion and open-set settings. To address these issues, we first decouple the de-occlusion model from 3D object generation, and enhance it by leveraging image datasets and collected de-occlusion datasets for much more diverse open-set occlusion patterns. Then, we propose a unified pose estimation model that integrates global and local mechanisms for both self-attention and cross-attention to improve accuracy. Besides, we construct an open-set 3D scene dataset to further extend the generalization of the pose estimation model. Comprehensive experiments demonstrate the superiority of our decoupled framework on both indoor and open-set scenes. Our codes and datasets is released at https://idea-research.github.io/SceneMaker/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Scene Generation | 3D-Front (test) | CD (Surface)0.0381 | 12 | |
| Scene Generation | MIDI (test) | CD-S5.1 | 9 | |
| Scene Generation | Open-set (test) | CD-S15.38 | 4 | |
| De-occlusion | Collected 1K images, 500 classes (val) | PSNR15.03 | 3 | |
| Object Generation | 3D-Front rendered by InstPifu (test) | Chamfer Distance0.0409 | 3 |