The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

About

Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also enable Monte Carlo inference, e.g., capturing uncertainty and ambiguity in flow and depth. With self-supervised pre-training, the combined use of synthetic and real data for supervised training, and technical innovations (infilling and step-unrolled denoising diffusion training) to handle noisy-incomplete training data, and a simple form of coarse-to-fine refinement, one can train state-of-the-art diffusion models for depth and optical flow estimation. Extensive experiments focus on quantitative performance against benchmarks, ablations, and the model's ability to capture uncertainty and multimodality, and impute missing values. Our model, DDVM (Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\% on the KITTI optical flow benchmark, about 25\% better than the best published method. For an overview see https://diffusion-vision.github.io.

Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet• 2023

Related benchmarks

Task	Dataset	Result
Optical Flow Estimation	KITTI 2015 (train)	Fl-epe2.19	446
Monocular Depth Estimation	NYU v2 (test)	Abs Rel0.074	320
Monocular Depth Estimation	KITTI	Abs Rel0.055	220
Monocular Depth Estimation	NYU V2	Delta 1 Acc94.6	174
Optical Flow Estimation	Sintel Final (test)	EPE2.48	133
Optical Flow Estimation	Sintel clean (test)	EPE1.75	120
Optical Flow	Sintel Final (train)	EPE2	112
Optical Flow Estimation	KITTI 2015 (test)	Fl-all3.26	108
Optical Flow	Sintel Clean (train)	EPE1.24	104
Optical Flow	KITTI (train)	Fl-all0.0326	90

Showing 10 of 16 rows

Other info

Code

Follow for update

@wizwand_team Discord