Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer
About
Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering. Enabling ML models to understand image formation might be key for generalization. However, due to an essential rasterization step involving discrete assignment operations, rendering pipelines are non-differentiable and thus largely inaccessible to gradient-based ML techniques. In this paper, we present {\emph DIB-R}, a differentiable rendering framework which allows gradients to be analytically computed for all pixels in an image. Key to our approach is to view foreground rasterization as a weighted interpolation of local properties and background rasterization as a distance-based aggregation of global geometry. Our approach allows for accurate optimization over vertex positions, colors, normals, light directions and texture coordinates through a variety of lighting models. We showcase our approach in two ML applications: single-image 3D object prediction, and 3D textured object generation, both trained using exclusively using 2D supervision. Our project website is: https://nv-tlabs.github.io/DIB-R/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Reconstruction | ShapeNet (test) | Mean IoU0.612 | 80 | |
| 3D Reconstruction from a single 2D image | ShapeNet (test) | Volumetric IoU (Airplane)57 | 11 | |
| Single-image 3D Reconstruction | CUB bird dataset unseen (test) | Mask IoU (%)75.7 | 8 | |
| 3D Reconstruction | PASCAL3D+ Car | mIoU80 | 7 | |
| 3D Reconstruction | CUB 41 (test) | mIoU75.7 | 6 | |
| 3D Object Reconstruction | ShapeNet Car | L1 Loss (Texture)0.0218 | 2 | |
| 3D Reconstruction | CUB bird dataset | Texture L1 Loss0.043 | 2 | |
| 3D Reconstruction | Original View Images | LPIPS0.33 | 2 |