Feature-metric Loss for Self-supervised Learning of Depth and Egomotion
About
Photometric loss is widely used for self-supervised depth and egomotion estimation. However, the loss landscapes induced by photometric differences are often problematic for optimization, caused by plateau landscapes for pixels in textureless regions or multiple local minima for less discriminative pixels. In this work, feature-metric loss is proposed and defined on feature representation, where the feature representation is also learned in a self-supervised manner and regularized by both first-order and second-order derivatives to constrain the loss landscapes to form proper convergence basins. Comprehensive experiments and detailed analysis via visualization demonstrate the effectiveness of the proposed feature-metric loss. In particular, our method improves state-of-the-art methods on KITTI from 0.885 to 0.925 measured by $\delta_1$ for depth estimation, and significantly outperforms previous method for visual odometry.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI (Eigen) | Abs Rel0.079 | 502 | |
| Depth Estimation | KITTI (Eigen split) | RMSE4.427 | 276 | |
| Monocular Depth Estimation | KITTI (Eigen split) | Abs Rel0.104 | 193 | |
| Stereo Matching | KITTI 2015 (test) | -- | 144 | |
| Monocular Depth Estimation | DDAD (test) | RMSE12.45 | 122 | |
| Monocular Depth Estimation | KITTI (test) | Abs Rel Error0.099 | 103 | |
| Monocular Depth Estimation | KITTI 2015 (Eigen split) | Abs Rel0.099 | 95 | |
| Stereo Matching | Middlebury (test) | 3PE8.13 | 47 | |
| Stereo Matching | Inria SLFD | 3 Pixel Error12.97 | 41 | |
| Depth Prediction | KITTI original ground truth (test) | Abs Rel0.099 | 38 |