DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models
About
Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics. Classic physically-based rendering (PBR) accurately simulates the light transport, but relies on precise scene representations--explicit 3D geometry, high-quality material properties, and lighting conditions--that are often impractical to obtain in real-world scenarios. Therefore, we introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework. Leveraging powerful video diffusion model priors, the inverse rendering model accurately estimates G-buffers from real-world videos, providing an interface for image editing tasks, and training data for the rendering model. Conversely, our rendering model generates photorealistic images from G-buffers without explicit light transport simulation. Experiments demonstrate that DiffusionRenderer effectively approximates inverse and forwards rendering, consistently outperforming the state-of-the-art. Our model enables practical applications from a single video input--including relighting, material editing, and realistic object insertion.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reflective Object Reconstruction | Glossy Synthetic | PSNR23.4 | 19 | |
| Intrinsic Decomposition | Hypersim | Albedo PSNR22.2 | 17 | |
| Intrinsic Decomposition | InteriorVerse | Albedo PSNR21.9 | 14 | |
| Cross-view intrinsic consistency | MipNeRF 360 Outdoor | Albedo0.043 | 13 | |
| Cross-view intrinsic consistency | MipNeRF 360 Indoor | Albedo0.068 | 13 | |
| Cross-view intrinsic consistency | Tanks&Temples | Albedo6 | 13 | |
| Cross-view intrinsic consistency | InteriorVerse GT | Albedo5.8 | 13 | |
| Relighting | ADT | LPIPS0.0916 | 9 | |
| Image-to-image relighting | MIIW cross-scene (test) | RMSE (raw)0.399 | 9 | |
| Relighting | Objaverse | LPIPS0.0609 | 9 |