Dynamic View Synthesis from Dynamic Monocular Video
About
We present an algorithm for generating novel views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene. Our work builds upon recent advances in neural implicit representation and uses continuous and differentiable functions for modeling the time-varying structure and the appearance of the scene. We jointly train a time-invariant static NeRF and a time-varying dynamic NeRF, and learn how to blend the results in an unsupervised manner. However, learning this implicit function from a single video is highly ill-posed (with infinitely many solutions that match the input video). To resolve the ambiguity, we introduce regularization losses to encourage a more physically plausible solution. We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dynamic Scene Novel View Synthesis | NVIDIA video dataset average over all scenes 112 | PSNR26.1 | 17 | |
| Novel View Synthesis | Nvidia Dataset | PSNR26.1 | 15 | |
| Novel View Synthesis | Dynamic Scene | PSNR (Jumping)24.68 | 9 | |
| Novel View Synthesis | Stereo Blur Dataset (test) | PSNR22.3 | 9 | |
| Novel View Synthesis | Nvidia Dynamic Scene Dataset Full 75 | SSIM0.921 | 5 | |
| Novel View Synthesis | Nvidia Dynamic Scene Dataset Dynamic Only 75 | SSIM0.778 | 5 | |
| View Synthesis | UCSD dataset (test) | SSIM (Full)0.943 | 5 |