In Depth We Trust: Reliable Monocular Depth Supervision for Gaussian Splatting
About
Using accurate depth priors in 3D Gaussian Splatting helps mitigate artifacts caused by sparse training data and textureless surfaces. However, acquiring accurate depth maps requires specialized acquisition systems. Foundation monocular depth estimation models offer a cost-effective alternative, but they suffer from scale ambiguity, multi-view inconsistency, and local geometric inaccuracies, which can degrade rendering performance when applied naively. This paper addresses the challenge of reliably leveraging monocular depth priors for Gaussian Splatting (GS) rendering enhancement. To this end, we introduce a training framework integrating scale-ambiguous and noisy depth priors into geometric supervision. We highlight the importance of learning from weakly aligned depth variations. We introduce a method to isolate ill-posed geometry for selective monocular depth regularization, restricting the propagation of depth inaccuracies into well-reconstructed 3D structures. Extensive experiments across diverse datasets show consistent improvements in geometric accuracy, leading to more faithful depth estimation and higher rendering quality across different GS variants and monocular depth backbones tested.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | ScanNet++ | PSNR24.53 | 67 | |
| Depth Estimation | ScanNet++ | AbsRel0.108 | 40 | |
| Novel View Synthesis | TanksAndTemples Low Data | PSNR20.578 | 9 | |
| Novel View Synthesis | TanksAndTemples Moderate Data | PSNR23.414 | 9 | |
| Novel View Synthesis | MipNeRF 360 Low Data | PSNR22.253 | 9 | |
| Novel View Synthesis | MipNeRF 360 Moderate Data | PSNR25.716 | 9 |