DiffRF: Rendering-Guided 3D Radiance Field Diffusion
About
We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Generation | OmniObject3D | FID (50K)147.6 | 9 | |
| Image-to-3D Generation | ShapeNet | FID98.53 | 7 | |
| Single-class 3D Generation | Amazon Berkeley Objects Tables (test) | FID27.06 | 5 | |
| Unconditional shape generation | PhotoShape Chairs (test) | FID15.95 | 5 | |
| Single-class 3D Generation | PhotoShape Chairs (test) | FID15.95 | 4 |