High Quality Monocular Depth Estimation via Transfer Learning
About
Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction. Existing solutions for depth estimation often produce blurry approximations of low resolution. This paper presents a convolutional neural network for computing a high-resolution depth map given a single RGB image with the help of transfer learning. Following a standard encoder-decoder architecture, we leverage features extracted using high performing pre-trained networks when initializing our encoder along with augmentation and training strategies that lead to more accurate results. We show how, even for a very simple decoder, our method is able to achieve detailed high-resolution depth maps. Our network, with fewer parameters and training iterations, outperforms state-of-the-art on two datasets and also produces qualitatively better results that capture object boundaries more faithfully. Code and corresponding pre-trained weights are made publicly available.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | NYU v2 (test) | Threshold Accuracy (delta < 1.25)84.6 | 423 | |
| Depth Estimation | NYU Depth V2 | RMSE0.39 | 177 | |
| Depth Prediction | NYU Depth V2 (test) | Accuracy (δ < 1.25)89.5 | 113 | |
| Monocular Depth Estimation | KITTI (test) | -- | 103 | |
| Monocular Depth Estimation | NYU Depth Eigen v2 (test) | A.Rel0.093 | 49 | |
| Depth Completion | ClearGrasp (test) | MAE (m)0.26 | 14 | |
| Depth Estimation | DIODE Indoor | Relative Error (REL)0.6599 | 13 | |
| Monocular Depth Estimation | NYU Depth V2 (test) | mSSIM96.8 | 3 | |
| Monocular Depth Estimation | Unreal-1k | Delta 154.4 | 3 |