Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
About
Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI (Eigen) | Abs Rel0.203 | 502 | |
| Depth Estimation | NYU v2 (test) | Threshold Accuracy (delta < 1.25)76.9 | 423 | |
| Depth Estimation | KITTI (Eigen split) | RMSE6.307 | 276 | |
| Monocular Depth Estimation | NYU v2 (test) | Abs Rel0.158 | 257 | |
| Depth Estimation | NYU Depth V2 | RMSE0.907 | 177 | |
| Monocular Depth Estimation | KITTI | Abs Rel0.203 | 161 | |
| Monocular Depth Estimation | KITTI Raw Eigen (test) | RMSE6.307 | 159 | |
| Monocular Depth Estimation | KITTI 80m maximum depth (Eigen) | Abs Rel0.203 | 126 | |
| Depth Prediction | NYU Depth V2 (test) | Accuracy (δ < 1.25)76.9 | 113 | |
| Monocular Depth Estimation | KITTI (test) | Abs Rel Error0.203 | 103 |