Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

About

We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenarios or suffer from the inability to capture geometric details. In this paper, we demonstrate that generative models, as opposed to traditional discriminative models (e.g., CNNs and Transformers), can effectively address the inherently ill-posed problem. We further show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. Specifically, we extend the original stable diffusion model to jointly predict depth and normal, allowing mutual information exchange and high consistency between the two representations. More importantly, we propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions. This strategy enables our model to recognize different scene layouts, capturing 3D geometry with remarkable fidelity. GeoWizard sets new benchmarks for zero-shot depth and normal prediction, significantly enhancing many downstream applications such as 3D reconstruction, 2D content creation, and novel viewpoint synthesis.

Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long• 2024

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel9.7
502
Monocular Depth EstimationKITTI
Abs Rel9.7
161
Monocular Depth EstimationETH3D
AbsRel6.4
117
Monocular Depth EstimationNYU V2
Delta 1 Acc96.6
113
Surface Normal PredictionNYU V2
Mean Error18.9
100
Depth EstimationScanNet
AbsRel0.066
94
Monocular Depth EstimationDIODE
AbsRel12
93
Depth EstimationKITTI
AbsRel0.129
92
Monocular Depth EstimationScanNet
AbsRel6.1
64
Depth EstimationDIODE
Delta-1 Accuracy75.3
62
Showing 10 of 25 rows

Other info

Follow for update