Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

About

Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also available at test time. The vast majority of monocular networks do not make use of this extra signal, thus ignoring valuable information that could be used to improve the predicted depth. Those that do, either use computationally expensive test-time refinement techniques or off-the-shelf recurrent networks, which only indirectly make use of the geometric information that is inherently available. We propose ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available. Taking inspiration from multi-view stereo, we propose a deep end-to-end cost volume based approach that is trained using self-supervision only. We present a novel consistency loss that encourages the network to ignore the cost volume when it is deemed unreliable, e.g. in the case of moving objects, and an augmentation scheme to cope with static cameras. Our detailed experiments on both KITTI and Cityscapes show that we outperform all published self-supervised baselines, including those that use single or multiple frames at test time.

Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, Michael Firman• 2021

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.087
502
Depth EstimationKITTI (Eigen split)
RMSE2.747
276
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.06
193
Monocular Depth EstimationDDAD (test)
RMSE13.899
122
Monocular Depth EstimationKITTI (test)
Abs Rel Error0.087
103
Monocular Depth EstimationKITTI Improved GT (Eigen)
AbsRel0.07
92
Depth EstimationKITTI
AbsRel0.091
92
Monocular Depth EstimationKITTI improved ground truth (Eigen split)
Abs Rel0.055
65
Monocular Depth EstimationCityscapes
Accuracy (delta < 1.25)87.5
62
Depth PredictionCityscapes (test)
RMSE6.223
52
Showing 10 of 40 rows

Other info

Code

Follow for update