InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images
About
We present a method for learning to generate unbounded flythrough videos of natural scenes starting from a single view, where this capability is learned from a collection of single photographs, without requiring camera poses or even multiple views of each scene. To achieve this, we propose a novel self-supervised view generation training paradigm, where we sample and rendering virtual camera trajectories, including cyclic ones, allowing our model to learn stable view generation from a collection of single views. At test time, despite never seeing a video during training, our approach can take a single image and generate long camera trajectories comprised of hundreds of new views with realistic and diverse content. We compare our approach with recent state-of-the-art supervised view generation methods that require posed multi-view videos and demonstrate superior performance and synthesis quality.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| View Synthesis | Tanks&Temples | PSNR10.78 | 15 | |
| Single-view Novel View Synthesis | DL3DV (Long-term (200th frame)) | PSNR9.12 | 13 | |
| Single-view Novel View Synthesis | RealEstate10K Long-term, 200th frame 84 (test) | PSNR10.22 | 13 | |
| Single-view Novel View Synthesis | RealEstate10K Short-term, 50th frame 84 (test) | PSNR14.31 | 13 | |
| Single-view Novel View Synthesis | DL3DV Short-term (50th frame) | PSNR10.21 | 13 | |
| Scene Extrapolation | LHQ (test) | FID26.24 | 6 | |
| Unbounded 3D scene generation | Large-scale Internet landscape image dataset 1.0 (test) | CE1.213 | 5 | |
| Perpetual view generation | RealEstate-10K | PSNR12.29 | 5 | |
| Novel View Synthesis | ACID (10 generated sequences) | PSNR18.92 | 3 | |
| Scene Extrapolation | ACID | Avg Points Reconstructed6.12e+5 | 3 |