Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis

About

The success of deep learning in computer vision over the past decade has hinged on large labeled datasets and strong pretrained models. In data-scarce settings, the quality of these pretrained models becomes crucial for effective transfer learning. Image classification and self-supervised learning have traditionally been the primary methods for pretraining CNNs and transformer-based architectures. Recently, the rise of text-to-image generative models, particularly those using denoising diffusion in a latent space, has introduced a new class of foundational models trained on massive, captioned image datasets. These models' ability to generate realistic images of unseen content suggests they possess a deep understanding of the visual world. In this work, we present Marigold, a family of conditional generative models and a fine-tuning protocol that extracts the knowledge from pretrained latent diffusion models like Stable Diffusion and adapts them for dense image analysis tasks, including monocular depth estimation, surface normals prediction, and intrinsic decomposition. Marigold requires minimal modification of the pre-trained latent diffusion model's architecture, trains with small synthetic datasets on a single GPU over a few days, and demonstrates state-of-the-art zero-shot generalization. Project page: https://marigoldcomputervision.github.io

Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler• 2025

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationNYU v2 (test)
Abs Rel0.542
257
Monocular Depth EstimationETH3D
AbsRel6.9
117
Monocular Depth EstimationDIODE
AbsRel29.8
93
Monocular Depth EstimationScanNet
AbsRel5.8
64
Surface Normal EstimationNYU V2--
23
Monocular Depth EstimationNYU
AbsRel5.5
21
Albedo EstimationARAP
LMSE0.022
19
Monocular Depth EstimationKITTI
AbsRel10.5
12
Albedo EstimationIIW v1.1 (test)
WHDR 10%16.7
11
Albedo EstimationInteriorverse (test)
PSNR19.5
10
Showing 10 of 16 rows

Other info

Follow for update