DeepV2D: Video to Depth with Differentiable Structure from Motion

About

We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D.

Zachary Teed, Jia Deng• 2018

Related benchmarks

Task	Dataset	Result
Depth Estimation	KITTI (Eigen split)	RMSE2.483	291
3D Reconstruction	7 Scenes	--	128
Monocular Depth Estimation	KITTI (test)	Abs Rel Error0.037	114
Depth Estimation	ScanNet (test)	Abs Rel0.057	65
Visual-Inertial Odometry	EuRoC (All sequences)	MH1 Error0.739	62
Video Depth Estimation	Sintel (test)	Delta 1 Accuracy50.9	61
Visual Odometry	TUM-RGBD	freiburg1/desk2 Error0.633	43
Absolute Trajectory Estimation	TUM RGB-D	Desk Error0.166	36
Camera pose estimation	TUM freiburg1	Rotation Error0.105	34
2D Depth Estimation	7 Scenes	Abs Rel0.437	28

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord