Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scaling Properties of Diffusion Models for Perceptual Tasks

About

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. To access our code and models, see https://scaling-diffusion-perception.github.io .

Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik• 2024

Related benchmarks

TaskDatasetResultRank
Depth EstimationNYU Depth V2--
209
Depth EstimationScanNet
AbsRel7.7
108
Depth EstimationDIODE
Relative Error (REL)31
63
Depth PredictionETH3D
AbsRel4.8
37
Metric Depth EstimationHypersim
AbsRel13.6
12
Optical Flow PredictionFlyingChairs (val)
EPE3.08
11
Amodal SegmentationCOCO-A (test)
mIoU82.9
6
Amodal SegmentationMP3D (test)
mIoU63.9
2
Amodal SegmentationP2G (test)
mIoU88.6
2
Showing 9 of 9 rows

Other info

Code

Follow for update