Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning to Recover 3D Scene Shape from a Single Image

About

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth

Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen• 2020

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel14.9
502
Depth EstimationNYU v2 (test)
Threshold Accuracy (delta < 1.25)91.6
423
Monocular Depth EstimationNYU v2 (test)
Abs Rel9
257
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.149
193
Monocular Depth EstimationKITTI
Abs Rel0.149
161
Monocular Depth EstimationETH3D
AbsRel17.1
117
Monocular Depth EstimationNYU V2
Delta 1 Acc91.6
113
Monocular Depth EstimationKITTI (test)
Abs Rel Error14.9
103
Depth EstimationScanNet
AbsRel9.6
94
Monocular Depth EstimationDIODE
AbsRel27.1
93
Showing 10 of 49 rows

Other info

Code

Follow for update