Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds

About

Multi-view inverse rendering aims to recover geometry, materials, and illumination consistently across multiple viewpoints. When applied to multi-view images, existing single-view approaches often ignore cross-view relationships, leading to inconsistent results. In contrast, multi-view optimization methods rely on slow differentiable rendering and per-scene refinement, making them computationally expensive and hard to scale. To address these limitations, we introduce a feed-forward multi-view inverse rendering framework that directly predicts spatially varying albedo, metallic, roughness, diffuse shading, and surface normals from sequences of RGB images. By alternating attention across views, our model captures both intra-view long-range lighting interactions and inter-view material consistency, enabling coherent scene-level reasoning within a single forward pass. Due to the scarcity of real-world training data, models trained on existing synthetic datasets often struggle to generalize to real-world scenes. To overcome this limitation, we propose a consistency-based finetuning strategy that leverages unlabeled real-world videos to enhance both multi-view coherence and robustness under in-the-wild conditions. Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance in terms of multi-view consistency, material and normal estimation quality, and generalization to real-world imagery. Project page: https://maddog241.github.io/mvinverse-page/

Xiangzuo Wu, Chengwei Ren, Jun Zhou, Xiu Li, Yuan Liu• 2025

Related benchmarks

TaskDatasetResultRank
Surface Normal PredictionNYU V2
Mean Error16.1
100
Surface Normal EstimationScanNet Normal Benchmark (test)
Angle Error Threshold (11.25°)66.3
18
Video Surface Normal EstimationSintel
Mean Angular Error31.3
12
Surface Normal EstimationiBIMS-1
MAE16
7
Surface Normal EstimationOASIS
Mean Angular Error23.1
7
Single-view inverse renderingInteriorverse (test)
Albedo PSNR23
6
Multi-view material consistency estimationHypersim (test)
Albedo RMSE0.0494
5
Showing 7 of 7 rows

Other info

Follow for update