Unsupervised Monocular Depth Learning in Dynamic Scenes

About

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most of the scene is static, and they tend to be constant for rigid moving objects. We show that this regularization alone is sufficient to train monocular depth prediction models that exceed the accuracy achieved in prior work for dynamic scenes, including methods that require semantic input. Code is at https://github.com/google-research/google-research/tree/master/depth_and_motion_learning .

Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova• 2020

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.13	523
Depth Estimation	KITTI (Eigen split)	RMSE5.138	291
Monocular Depth Estimation	KITTI	Abs Rel0.13	220
Monocular Depth Estimation	KITTI Raw Eigen (test)	RMSE5.138	159
Depth Estimation	KITTI	RMSE5.138	156
Monocular Depth Estimation	KITTI Improved GT (Eigen)	AbsRel0.13	111
Monocular Depth Estimation	Cityscapes	Accuracy (delta < 1.25)84.6	74
Depth Prediction	Cityscapes (test)	RMSE6.98	52
Depth Estimation	Cityscapes (test)	--	40
Depth Prediction	KITTI original ground truth (test)	Abs Rel0.13	38

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord