Learning Single-Image Depth from Videos using Quality Assessment Networks
About
Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depth estimation in the wild.
Weifeng Chen, Shengyi Qian, Jia Deng• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI | Abs Rel0.327 | 161 | |
| Depth Estimation | KITTI | AbsRel37.9 | 92 | |
| Monocular Depth Estimation | ScanNet | AbsRel16.5 | 64 | |
| Depth Estimation | DIODE | Delta-1 Accuracy66 | 62 | |
| Depth Prediction | ETH3D | AbsRel23.7 | 35 | |
| Depth Prediction | Sintel | -- | 32 | |
| 2D Depth Estimation | ScanNet | AbsRel23.7 | 26 | |
| Monocular Depth Estimation | NYU | AbsRel16.6 | 21 | |
| Depth Prediction | NYU | Delta-1 Accuracy77.3 | 16 | |
| Depth Prediction | YT3D | AbsRel20.9 | 9 |
Showing 10 of 11 rows