Unsupervised Scale-consistent Depth Learning from Video

About

We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time. Our contributions include: (i) we propose a geometry consistency loss, which penalizes the inconsistency of predicted depths between adjacent views; (ii) we propose a self-discovered mask to automatically localize moving objects that violate the underlying static scene assumption and cause noisy signals during training; (iii) we demonstrate the efficacy of each component with a detailed ablation study and show high-quality depth estimation results in both KITTI and NYUv2 datasets. Moreover, thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system for more robust and accurate tracking. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training. Finally, we provide several demos for qualitative evaluation.

Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid• 2021

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.119	523
Depth Estimation	NYU v2 (test)	Threshold Accuracy (delta < 1.25)81.3	435
Monocular Depth Estimation	KITTI	Abs Rel0.114	220
Monocular Depth Estimation	KITTI Improved GT (Eigen)	AbsRel0.119	111
Depth Estimation	ScanNet (test)	Abs Rel0.169	65
Monocular Depth Estimation	DDAD	Abs Rel Error0.169	21
Single-view depth estimation	NYUv2 36 (test)	AbsRel0.159	21
Single-view depth estimation	NYU official 654 images v2 (test)	AbsRel0.159	21
Visual Odometry	KITTI Seq. 10	Translational Error (%)3.82	20
Visual Odometry	KITTI Seq. 09	Translation Error (%)5.08	20

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord