Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

About

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of the single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity. Meanwhile, SOTA monocular methods trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Equipped with our module, monocular models can be stably trained with over 8 million images with thousands of camera models, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Experiments demonstrate SOTA performance of our method on 7 zero-shot benchmarks. Notably, our method won the championship in the 2nd Monocular Depth Estimation Challenge. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 1), leading to high-quality metric scale dense mapping. The code is available at https://github.com/YvanYin/Metric3D.

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen• 2023

Related benchmarks

TaskDatasetResultRank
Depth EstimationNYU v2 (test)
Threshold Accuracy (delta < 1.25)96.6
423
Monocular Depth EstimationNYU v2 (test)
Abs Rel0.094
257
Depth EstimationNYU Depth V2
RMSE0.183
177
Monocular Depth EstimationDDAD (test)
RMSE12.15
122
Monocular Depth EstimationETH3D
AbsRel0.859
117
Monocular Depth EstimationKITTI (test)
Abs Rel Error0.053
103
Monocular Depth EstimationKITTI Eigen split (test)
AbsRel Mean5.33
94
Depth EstimationKITTI
AbsRel0.044
92
Depth EstimationScanNet (test)
Abs Rel0.074
65
Depth EstimationNYU v2 (val)
RMSE0.337
53
Showing 10 of 66 rows

Other info

Code

Follow for update