ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
About
We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at http://kylesargent.github.io/zeronvs/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | Tanks&Temples (test) | PSNR13.18 | 239 | |
| Novel View Synthesis | Mip-NeRF 360 (test) | PSNR15.81 | 166 | |
| Novel View Synthesis | LLFF | PSNR18.79 | 124 | |
| Novel View Synthesis | RealEstate10K | PSNR23.73 | 116 | |
| Novel View Synthesis | Mip-NeRF360 | PSNR15.99 | 104 | |
| Novel View Synthesis | DTU | PSNR17.92 | 100 | |
| Novel View Synthesis | CO3D | PSNR20.5 | 24 | |
| Novel View Synthesis | RealEstate10K Hard | PSNR14.24 | 20 | |
| Novel View Synthesis | RealEstate10K Easy | PSNR16.5 | 20 | |
| Few-view 3D Reconstruction | RealEstate10K (test) | PSNR23.73 | 20 |