Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

About

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer architectures. Still, monocular depth estimators tend to struggle when presented with images with unfamiliar content and layout, since their knowledge of the visual world is restricted by the data seen during training, and challenged by zero-shot generalization to new domains. This motivates us to explore whether the extensive priors captured in recent generative diffusion models can enable better, more generalizable depth estimation. We introduce Marigold, a method for affine-invariant monocular depth estimation that is derived from Stable Diffusion and retains its rich prior knowledge. The estimator can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases. Project page: https://marigoldmonodepth.github.io.

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler• 2023

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI (Eigen)
Abs Rel9.9
523
Depth EstimationNYU v2 (test)
Threshold Accuracy (delta < 1.25)90.4
432
Monocular Depth EstimationNYU v2 (test)
Abs Rel0.055
300
Monocular Depth EstimationKITTI (Eigen split)
Abs Rel0.099
215
Depth EstimationNYU Depth V2--
209
Monocular Depth EstimationKITTI
Abs Rel0.09
203
Depth CompletionNYU-depth-v2 official (test)--
200
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)51.5
193
Monocular Depth EstimationETH3D
AbsRel6
132
Monocular Depth EstimationNYU V2
Delta 1 Acc96.4
131
Showing 10 of 133 rows
...

Other info

Code

Follow for update