Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

About

Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

David Eigen, Christian Puhrsch, Rob Fergus• 2014

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.203	523
Depth Estimation	NYU v2 (test)	Threshold Accuracy (delta < 1.25)76.9	435
Monocular Depth Estimation	NYU v2 (test)	Abs Rel0.158	320
Depth Estimation	KITTI (Eigen split)	RMSE6.307	291
Monocular Depth Estimation	KITTI	Abs Rel0.203	220
Depth Estimation	NYU Depth V2	RMSE0.907	209
Monocular Depth Estimation	KITTI Raw Eigen (test)	RMSE6.307	159
Depth Estimation	KITTI	RMSE7.156	156
Monocular Depth Estimation	KITTI 80m maximum depth (Eigen)	Abs Rel0.203	126
Surface Normal Prediction	NYU V2	Mean Error23.7	123

Showing 10 of 33 rows

Other info

Follow for update

@wizwand_team Discord