Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

About

We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal. Similarly to prior work, our method learns by applying differentiable warping to frames and comparing the result to adjacent ones, but it provides several improvements: We address occlusions geometrically and differentiably, directly using the depth maps as predicted during training. We introduce randomized layer normalization, a novel powerful regularizer, and we account for object motion relative to the scene. To the best of our knowledge, our work is the first to learn the camera intrinsic parameters, including lens distortion, from video in an unsupervised manner, thereby allowing us to extract accurate depth and motion from arbitrary videos of unknown origin at scale. We evaluate our results on the Cityscapes, KITTI and EuRoC datasets, establishing new state of the art on depth prediction and odometry, and demonstrate qualitatively that depth prediction can be learned from a collection of YouTube videos.

Ariel Gordon, Hanhan Li, Rico Jonschkowski, Anelia Angelova• 2019

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel0.128
502
Depth EstimationKITTI (Eigen split)
RMSE5.232
276
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.128
193
Monocular Depth EstimationKITTI
Abs Rel0.128
161
Monocular Depth EstimationKITTI Raw Eigen (test)
RMSE5.12
159
Monocular Depth EstimationCityscapes
Accuracy (delta < 1.25)83
62
Depth PredictionCityscapes (test)
RMSE6.96
52
Depth EstimationCityscapes (test)--
40
Depth PredictionKITTI original ground truth (test)
Abs Rel0.128
38
Depth PredictionKITTI original (Eigen split)
Abs Rel0.128
29
Showing 10 of 17 rows

Other info

Follow for update