Robust Consistent Video Depth Estimation

About

We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video. We integrate a learning-based depth prior, in the form of a convolutional neural network trained for single-image depth estimation, with geometric optimization, to estimate a smooth camera trajectory as well as detailed and stable depth reconstruction. Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details. In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations. Our method quantitatively outperforms state-of-the-arts on the Sintel benchmark for both depth and pose estimations and attains favorable qualitative results across diverse wild datasets.

Johannes Kopf, Xuejian Rong, Jia-Bin Huang• 2020

Related benchmarks

Task	Dataset	Result
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)47.8	235
Camera pose estimation	TUM-dynamic	ATE0.153	205
Camera pose estimation	Sintel	ATE0.274	203
Camera pose estimation	ScanNet	RPE (t)0.064	133
Camera pose estimation	TUM dynamics	ATE0.153	90
Video Depth Estimation	Sintel (test)	Delta 1 Accuracy67.3	61
Camera pose estimation	Sintel ~50 frames	ATE0.36	41
Camera pose estimation	ScanNet static indoor scenes	ATE0.227	40
Pose Estimation	BONN	ATE0.217	38
Video Depth Estimation	KITTI (test)	Delta190.1	25

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord