Learning Single-Image Depth from Videos using Quality Assessment Networks

About

Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depth estimation in the wild.

Weifeng Chen, Shengyi Qian, Jia Deng• 2018

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI	Abs Rel0.327	220
Depth Estimation	KITTI	--	156
Monocular Depth Estimation	ScanNet	AbsRel16.5	103
Depth Estimation	DIODE	Delta-1 Accuracy66	82
Depth Prediction	ETH3D	AbsRel23.7	37
Depth Prediction	Sintel	--	32
2D Depth Estimation	ScanNet	AbsRel23.7	26
Monocular Depth Estimation	NYU	AbsRel16.6	26
Depth Prediction	NYU	Delta-1 Accuracy77.3	16
Depth Prediction	YT3D	AbsRel20.9	9

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord