Geo-ID: Test-Time Geometric Consensus for Cross-View Consistent Intrinsics
About
Intrinsic image decomposition aims to estimate physically based rendering (PBR) parameters such as albedo, roughness, and metallicity from images. While recent methods achieve strong single-view predictions, applying them independently to multiple views of the same scene often yields inconsistent estimates, limiting their use in downstream applications such as editable neural scenes and 3D reconstruction. Video-based models can improve cross-frame consistency but require dense, ordered sequences and substantial compute, limiting their applicability to sparse, unordered image collections. We propose Geo-ID, a novel test-time framework that repurposes pretrained single-view intrinsic predictors to produce cross-view consistent decompositions by coupling independent per-view predictions through sparse geometric correspondences that form uncertainty-aware consensus targets. Geo-ID is model-agnostic, requires no retraining or inverse rendering, and applies directly to off-the-shelf intrinsic predictors. Experiments on synthetic benchmarks and real-world scenes demonstrate substantial improvements in cross-view intrinsic consistency as the number of views increases, while maintaining comparable single-view decomposition performance. We further show that the resulting consistent intrinsics enable coherent appearance editing and relighting in downstream neural scene representations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Intrinsic Decomposition | Hypersim | Albedo PSNR18.5 | 17 | |
| Intrinsic Decomposition | InteriorVerse | Albedo PSNR19.5 | 14 | |
| Cross-view intrinsic consistency | MipNeRF 360 Outdoor | Albedo0.054 | 13 | |
| Cross-view intrinsic consistency | MipNeRF 360 Indoor | Albedo0.11 | 13 | |
| Cross-view intrinsic consistency | Tanks&Temples | Albedo10.6 | 13 | |
| Cross-view intrinsic consistency | InteriorVerse GT | Albedo10.3 | 13 |