Unconstrained Scene Generation with Locally Conditioned Radiance Fields
About
We tackle the challenge of learning a distribution over complex, realistic, indoor scenes. In this paper, we introduce Generative Scene Networks (GSN), which learns to decompose scenes into a collection of many local radiance fields that can be rendered from a free moving camera. Our model can be used as a prior to generate new scenes, or to complete a scene given only sparse 2D observations. Recent work has shown that generative models of radiance fields can capture properties such as multi-view consistency and view-dependent lighting. However, these models are specialized for constrained viewing of single objects, such as cars or faces. Due to the size and complexity of realistic indoor environments, existing models lack the representational capacity to adequately capture them. Our decomposition scheme scales to larger and more complex scenes while preserving details and diversity, and the learned prior enables high-quality rendering from viewpoints that are significantly different from observed viewpoints. When compared to existing models, GSN produces quantitatively higher-quality scene renderings across several different scene datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unbounded 3D City Generation | KITTI-360 (test) | FID160 | 5 | |
| Novel View Synthesis | Waymo Open Dataset 5 static scenes 10% unseen poses | PSNR16.83 | 4 | |
| Reconstruction | Waymo Open Dataset (5 static scenes, 10% held-out frames) | PSNR17.98 | 4 | |
| 3D Scene Generation | Matterport3D castle | KID0.05 | 3 | |
| Generative Modeling | VizDoom 26 (test) | FID37.21 | 3 | |
| Generative Modeling | Replica 52 (test) | FID41.75 | 3 | |
| Generative Modeling | AVD 1 (test) | FID51.11 | 3 | |
| View Synthesis | AVD 1 | Memorization L119 | 3 | |
| View Synthesis | Vizdoom | Memorization L10.07 | 3 | |
| 3D Scene Generation | Replica frl_apt.4 | KID0.052 | 3 |