Slot-guided Volumetric Object Radiance Fields
About
We present a novel framework for 3D object-centric representation learning. Our approach effectively decomposes complex scenes into individual objects from a single image in an unsupervised fashion. This method, called slot-guided Volumetric Object Radiance Fields (sVORF), composes volumetric object radiance fields with object slots as a guidance to implement unsupervised 3D scene decomposition. Specifically, sVORF obtains object slots from a single image via a transformer module, maps these slots to volumetric object radiance fields with a hypernetwork and composes object radiance fields with the guidance of object slots at a 3D location. Moreover, sVORF significantly reduces memory requirement due to small-sized pixel rendering during training. We demonstrate the effectiveness of our approach by showing top results in scene decomposition and generation tasks of complex synthetic datasets (e.g., Room-Diverse). Furthermore, we also confirm the potential of sVORF to segment objects in real-world scenes (e.g., the LLFF dataset). We hope our approach can provide preliminary understanding of the physical world and help ease future research in 3D object-centric representation learning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scene Segmentation | Room-Chair | ARI87.8 | 4 | |
| Scene Segmentation | Room-Diverse | ARI78.4 | 4 | |
| Scene Decomposition | CLEVR 567 unseen appearance uORF-variant (test) | ARI83.9 | 4 | |
| Scene Segmentation | CLEVR 567 | ARI82.7 | 4 | |
| Novel View Synthesis | CLEVR-567 (test) | LPIPS0.0211 | 3 | |
| Novel View Synthesis | Room-Diverse (test) | LPIPS0.1637 | 3 | |
| Scene Decomposition | packed-CLEVR-11 (test) | ARI0.81 | 3 | |
| Novel View Synthesis | Room-Chair (test) | LPIPS0.0824 | 3 | |
| Novel View Synthesis | MSN | PSNR30.51 | 2 | |
| 3D Scene Segmentation | MSN | ARI*63.4 | 2 |