MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
About
We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | LLFF | PSNR24.07 | 124 | |
| Novel View Synthesis | Blender | PSNR25.54 | 60 | |
| Novel View Synthesis | Shiny | PSNR20.49 | 28 | |
| Novel View Synthesis | DTU 1 (test) | PSNR28.21 | 22 | |
| Novel View Synthesis | Real Forward-facing 640 x 960 (test) | PSNR24.07 | 21 | |
| Novel View Synthesis | NeRF Synthetic 800 x 800 (test) | PSNR26.46 | 21 | |
| Novel View Synthesis | Camera | PSNR29.326 | 6 | |
| Novel View Synthesis | GoPro | PSNR27.413 | 6 | |
| Novel View Synthesis | Mobile | PSNR19.927 | 6 |