Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robust Consistent Video Depth Estimation

About

We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video. We integrate a learning-based depth prior, in the form of a convolutional neural network trained for single-image depth estimation, with geometric optimization, to estimate a smooth camera trajectory as well as detailed and stable depth reconstruction. Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details. In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations. Our method quantitatively outperforms state-of-the-arts on the Sintel benchmark for both depth and pose estimations and attains favorable qualitative results across diverse wild datasets.

Johannes Kopf, Xuejian Rong, Jia-Bin Huang• 2020

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)47.8
235
Camera pose estimationTUM-dynamic
ATE0.153
205
Camera pose estimationSintel
ATE0.274
203
Camera pose estimationScanNet
RPE (t)0.064
133
Camera pose estimationTUM dynamics
ATE0.153
90
Video Depth EstimationSintel (test)
Delta 1 Accuracy67.3
61
Camera pose estimationSintel ~50 frames
ATE0.36
41
Camera pose estimationScanNet static indoor scenes
ATE0.227
40
Pose EstimationBONN
ATE0.217
38
Video Depth EstimationKITTI (test)
Delta190.1
25
Showing 10 of 32 rows

Other info

Follow for update