DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

About

Monocular depth estimation is a challenging task that predicts the pixel-wise depth from a single 2D image. Current methods typically model this problem as a regression or classification task. We propose DiffusionDepth, a new approach that reformulates monocular depth estimation as a denoising diffusion process. It learns an iterative denoising process to `denoise' random depth distribution into a depth map with the guidance of monocular visual conditions. The process is performed in the latent space encoded by a dedicated depth encoder and decoder. Instead of diffusing ground truth (GT) depth, the model learns to reverse the process of diffusing the refined depth of itself into random depth distribution. This self-diffusion formulation overcomes the difficulty of applying generative models to sparse GT depth scenarios. The proposed approach benefits this task by refining depth estimation step by step, which is superior for generating accurate and highly detailed depth maps. Experimental results on KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion approach could reach state-of-the-art performance in both indoor and outdoor scenarios with acceptable inference time.

Yiqun Duan, Xianda Guo, Zheng Zhu• 2023

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	KITTI (Eigen)	Abs Rel0.05	552
Monocular Depth Estimation	NYU v2 (test)	Abs Rel0.085	327
Monocular Depth Estimation	KITTI	Abs Rel0.05	220
Monocular Depth Estimation	NYU-Depth v2 (official)	Abs Rel0.085	97
Monocular Depth Estimation	NYU V2	Runtime (s)223	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord