Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unsupervised Learning of Depth and Ego-Motion from Video

About

We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences. We achieve this by simultaneously training depth and camera pose estimation networks using the task of view synthesis as the supervisory signal. The networks are thus coupled via the view synthesis objective during training, but can be applied independently at test time. Empirical evaluation on the KITTI dataset demonstrates the effectiveness of our approach: 1) monocular depth performing comparably with supervised methods that use either ground-truth pose or depth for training, and 2) pose estimation performing favorably with established SLAM systems under comparable input settings.

Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe• 2017

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.183
502
Depth EstimationNYU v2 (test)
Threshold Accuracy (delta < 1.25)67.4
423
Depth EstimationKITTI (Eigen split)
RMSE6.709
276
Surface Normal EstimationNYU v2 (test)
Mean Angle Distance (MAD)43.5
206
Monocular Depth EstimationKITTI
Abs Rel0.208
161
Monocular Depth EstimationKITTI Raw Eigen (test)
RMSE4.975
159
Monocular Depth EstimationMake3D (test)
Abs Rel0.383
132
Monocular Depth EstimationKITTI 80m maximum depth (Eigen)
Abs Rel0.121
126
Monocular Depth EstimationKITTI 2015 (Eigen split)
Abs Rel0.183
95
Depth EstimationKITTI
AbsRel0.13
92
Showing 10 of 50 rows

Other info

Code

Follow for update