Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unsupervised Scale-consistent Depth Learning from Video

About

We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time. Our contributions include: (i) we propose a geometry consistency loss, which penalizes the inconsistency of predicted depths between adjacent views; (ii) we propose a self-discovered mask to automatically localize moving objects that violate the underlying static scene assumption and cause noisy signals during training; (iii) we demonstrate the efficacy of each component with a detailed ablation study and show high-quality depth estimation results in both KITTI and NYUv2 datasets. Moreover, thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system for more robust and accurate tracking. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training. Finally, we provide several demos for qualitative evaluation.

Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid• 2021

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.119
502
Depth EstimationNYU v2 (test)
Threshold Accuracy (delta < 1.25)81.3
423
Monocular Depth EstimationKITTI
Abs Rel0.114
161
Monocular Depth EstimationKITTI Improved GT (Eigen)
AbsRel0.119
92
Depth EstimationScanNet (test)
Abs Rel0.169
65
Single-view depth estimationNYUv2 36 (test)
AbsRel0.159
21
Single-view depth estimationNYU official 654 images v2 (test)
AbsRel0.159
21
Visual OdometryKITTI Seq. 10
Translational Error (%)3.82
20
Visual OdometryKITTI Seq. 09
Translation Error (%)5.08
20
Monocular Depth EstimationDDAD
Abs Rel Error0.169
17
Showing 10 of 18 rows

Other info

Code

Follow for update