Behind the Scenes: Density Fields for Single View Reconstruction
About
Inferring a meaningful geometric scene representation from a single image is a fundamental problem in computer vision. Approaches based on traditional depth map prediction can only reason about areas that are visible in the image. Currently, neural radiance fields (NeRFs) can capture true 3D including color, but are too complex to be generated from a single image. As an alternative, we propose to predict implicit density fields. A density field maps every location in the frustum of the input image to volumetric density. By directly sampling color from the available views instead of storing color in the density field, our scene representation becomes significantly less complex compared to NeRFs, and a neural network can predict it in a single forward pass. The prediction network is trained through self-supervision from only video data. Our formulation allows volume rendering to perform both depth prediction and novel view synthesis. Through experiments, we show that our method is able to predict meaningful geometry for regions that are occluded in the input image. Additionally, we demonstrate the potential of our approach on three datasets for depth prediction and novel-view synthesis.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | RealEstate 10k (RE10k) (test) | PSNR22.9949 | 16 | |
| Object Reconstruction | KITTI-360 4-20m (short range evaluation) | Oacc92 | 10 | |
| Object Reconstruction | KITTI-360 4-50m (long range evaluation) | Object Accuracy84 | 10 | |
| 3D Occupancy Prediction | SSCBench-KITTI-360 (test) | OAcc87 | 5 | |
| Novel View Synthesis | Mannequin Challenge (MC) (test) | MAE0.0463 | 4 | |
| Single-view 3D Scene Reconstruction | DDAD (test) | Oacc48 | 3 |