Unsupervised Monocular Depth Learning in Dynamic Scenes
About
We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most of the scene is static, and they tend to be constant for rigid moving objects. We show that this regularization alone is sufficient to train monocular depth prediction models that exceed the accuracy achieved in prior work for dynamic scenes, including methods that require semantic input. Code is at https://github.com/google-research/google-research/tree/master/depth_and_motion_learning .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI (Eigen) | Abs Rel0.13 | 502 | |
| Depth Estimation | KITTI (Eigen split) | RMSE5.138 | 276 | |
| Monocular Depth Estimation | KITTI | Abs Rel0.13 | 161 | |
| Monocular Depth Estimation | KITTI Raw Eigen (test) | RMSE5.138 | 159 | |
| Monocular Depth Estimation | KITTI Improved GT (Eigen) | AbsRel0.13 | 92 | |
| Monocular Depth Estimation | Cityscapes | Accuracy (delta < 1.25)84.6 | 62 | |
| Depth Prediction | Cityscapes (test) | RMSE6.98 | 52 | |
| Depth Estimation | Cityscapes (test) | -- | 40 | |
| Depth Prediction | KITTI original ground truth (test) | Abs Rel0.13 | 38 | |
| Depth Prediction | KITTI original (Eigen split) | Abs Rel0.13 | 29 |