SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
About
We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Scene Generation | 3D-FRONT | P(Tr)65 | 5 | |
| 3D Scene Geometry Synthesis | 3D-FRONT Independent chunks | MMD (CD)0.019 | 5 | |
| 3D Scene Geometry Synthesis | 3D-FRONT Independent chunks 1.0 (test) | MMD (CD)0.021 | 5 | |
| Text-guided 3D scene generation | 3D Scenes with Qwen1.5 captions (Independent chunks) | CLIP-Score23.96 | 4 | |
| Text-guided 3D scene generation | 3D Scenes with Qwen1.5 captions (Scene chunks) | CLIP-Score23.79 | 4 | |
| Text-to-3D Scene Generation | 3D-FRONT Independent chunks | CLIP Score29.81 | 4 | |
| Text-to-3D Scene Generation | 3D-FRONT (Scene chunks) | CLIP Score29.4 | 4 | |
| 3D Scene Geometry Synthesis | 3D-FRONT Scene chunks Outpainted | MMD (CD)0.021 | 3 | |
| 3D Scene Geometry Synthesis | 3D-FRONT Scene chunks 1.0 (test) | MMD (CD)0.026 | 3 | |
| 3D Scene Synthesis | 3D-FRONT Independent chunks | MMD (CD)0.019 | 3 |