Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth
About
Conventional self-supervised monocular depth prediction methods are based on a static environment assumption, which leads to accuracy degradation in dynamic scenes due to the mismatch and occlusion problems introduced by object motions. Existing dynamic-object-focused methods only partially solved the mismatch problem at the training loss level. In this paper, we accordingly propose a novel multi-frame monocular depth prediction method to solve these problems at both the prediction and supervision loss levels. Our method, called DynamicDepth, is a new framework trained via a self-supervised cycle consistent learning scheme. A Dynamic Object Motion Disentanglement (DOMD) module is proposed to disentangle object motions to solve the mismatch problem. Moreover, novel occlusion-aware Cost Volume and Re-projection Loss are designed to alleviate the occlusion effects of object motions. Extensive analyses and experiments on the Cityscapes and KITTI datasets show that our method significantly outperforms the state-of-the-art monocular depth prediction methods, especially in the areas of dynamic objects. Code is available at https://github.com/AutoAILab/DynamicDepth
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI (Eigen) | Abs Rel0.096 | 502 | |
| Depth Estimation | KITTI (Eigen split) | RMSE4.458 | 276 | |
| Monocular Depth Estimation | KITTI (Eigen split) | Abs Rel0.096 | 193 | |
| Monocular Depth Estimation | KITTI | Abs Rel0.096 | 161 | |
| Monocular Depth Estimation | KITTI Improved GT (Eigen) | AbsRel0.068 | 92 | |
| Monocular Depth Estimation | KITTI improved ground truth (Eigen split) | Abs Rel0.068 | 65 | |
| Monocular Depth Estimation | Cityscapes | Accuracy (delta < 1.25)89.5 | 62 | |
| Depth Prediction | Cityscapes (test) | RMSE5.867 | 52 | |
| Depth Estimation | Cityscapes (test) | -- | 40 | |
| Depth Prediction | KITTI original ground truth (test) | Abs Rel0.096 | 38 |