MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views
About
We introduce MVSplat360, a feed-forward approach for 360{\deg} novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging for conventional methods to achieve high-quality results. Our MVSplat360 addresses this by effectively combining geometry-aware 3D reconstruction with temporally consistent video generation. Specifically, it refactors a feed-forward 3D Gaussian Splatting (3DGS) model to render features directly into the latent space of a pre-trained Stable Video Diffusion (SVD) model, where these features then act as pose and visual cues to guide the denoising process and produce photorealistic 3D-consistent views. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views. To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\deg} NVS tasks. Experiments on the existing benchmark RealEstate10K also confirm the effectiveness of our model. The video results are available on our project page: https://donydchen.github.io/mvsplat360.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | DL3DV | PSNR16.37 | 61 | |
| Multi-view Generation | RealEstate10K | MEt3R0.022 | 7 | |
| Novel View Synthesis | DL3DV 140 (test) | PSNR18.54 | 6 | |
| Novel View Synthesis | RE10K wide-view baseline (test) | PSNR21.09 | 5 | |
| Novel View Synthesis | RE10K narrow-view baseline (test) | PSNR23.88 | 5 | |
| Scene-level reconstruction | DL3DV 10K (test) | PSNR17.42 | 4 | |
| Novel View Synthesis | ACID Zero-Shot v1 (test) | PSNR21.75 | 4 | |
| Multi-view consistency | DL3DV | MEt3R Score0.0579 | 3 | |
| Multi-view consistency | ACID | MEt3R0.0196 | 3 | |
| 3D Consistency | DL3DV (test) | LPIPS0.35 | 3 |