Pix2NeRF: Unsupervised Conditional $\pi$-GAN for Single Image to Neural Radiance Fields Translation
About
We propose a pipeline to generate Neural Radiance Fields~(NeRF) of an object or a scene of a specific class, conditioned on a single input image. This is a challenging task, as training NeRF requires multiple views of the same scene, coupled with corresponding poses, which are hard to obtain. Our method is based on $\pi$-GAN, a generative model for unconditional 3D-aware image synthesis, which maps random latent codes to radiance fields of a class of objects. We jointly optimize (1) the $\pi$-GAN objective to utilize its high-fidelity 3D-aware generation and (2) a carefully designed reconstruction objective. The latter includes an encoder coupled with $\pi$-GAN generator to form an auto-encoder. Unlike previous few-shot NeRF approaches, our pipeline is unsupervised, capable of being trained with independent images without 3D, multi-view, or pose supervision. Applications of our pipeline include 3d avatar generation, object-centric novel view synthesis with a single input image, and 3d-aware super-resolution, to name a few.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | ShapeNet cars category | PSNR23.17 | 20 | |
| Novel View Synthesis | ShapeNet chairs | SSIM0.91 | 9 | |
| 3D Reconstruction | ShapeNet-SRN chairs (test) | PSNR18.14 | 8 | |
| Novel View Synthesis | CelebA-HQ | ID Similarity19 | 7 | |
| Edge2car | ShapeNet Car (test) | FID23.42 | 7 | |
| Seg2face | CelebAMask-HQ (test) | FID54.23 | 7 | |
| Segmentation-to-Cat Image Generation | AFHQ cat 34 (test) | FID43.92 | 7 | |
| 3D-aware Image Synthesis | CARLA 64x64 resolution | FID10.54 | 5 | |
| 3D-aware Image Synthesis | CARLA 128x128 resolution | FID27.23 | 5 | |
| Novel View Synthesis | FFHQ | FID32.44 | 5 |