Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning the Depths of Moving People by Watching Frozen People

About

We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving. Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects' motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene. Because people are stationary, training data can be generated using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. We demonstrate our method on real-world sequences of complex human actions captured by a moving hand-held camera, show improvement over state-of-the-art monocular depth prediction methods, and show various 3D effects produced using our predicted depth.

Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, William T. Freeman• 2019

Related benchmarks

TaskDatasetResultRank
Depth PredictionETH3D
AbsRel18.1
35
Depth PredictionSintel
AbsRel0.385
32
Monocular Depth EstimationDIW
WHDR23.15
19
Monocular Depth EstimationTUM
Accuracy (delta <= 1.25)29.54
9
Monocular Depth EstimationNYU
Threshold Error (delta > 1.25)18.57
9
Monocular Depth EstimationKITTI
Error Rate (> 1.25)36.29
9
Showing 6 of 6 rows

Other info

Follow for update