Digging Into Self-Supervised Monocular Depth Estimation
About
Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI (Eigen) | Abs Rel0.106 | 502 | |
| Depth Estimation | NYU v2 (test) | Threshold Accuracy (delta < 1.25)77.1 | 423 | |
| Depth Estimation | KITTI (Eigen split) | RMSE4.577 | 276 | |
| Monocular Depth Estimation | NYU v2 (test) | Abs Rel0.171 | 257 | |
| Surface Normal Estimation | NYU v2 (test) | Mean Angle Distance (MAD)43.8 | 206 | |
| Monocular Depth Estimation | KITTI (Eigen split) | Abs Rel0.106 | 193 | |
| Monocular Depth Estimation | KITTI | Abs Rel0.115 | 161 | |
| Monocular Depth Estimation | KITTI Raw Eigen (test) | RMSE4.701 | 159 | |
| Monocular Depth Estimation | Make3D (test) | Abs Rel0.321 | 132 | |
| Monocular Depth Estimation | KITTI 80m maximum depth (Eigen) | Abs Rel0.106 | 126 |