Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

One Step Diffusion via Shortcut Models

About

Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.

Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel• 2024

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256--
815
Class-conditional Image GenerationImageNet 256x256 (val)
FID3.8
427
Image GenerationImageNet 256x256
IS102.7
359
Image GenerationImageNet 256x256 (val)
FID10.6
340
Class-conditional Image GenerationImageNet 256x256 (test)
FID7.8
208
Class-conditional Image GenerationImageNet 256x256 (train val)
FID7.8
178
Class-conditional generationImageNet 256 x 256 1k (val)--
102
Class-conditional Image GenerationImageNet class-conditional 256x256 (test val)
FID7.8
81
Image GenerationImageNet 256x256 (test)
FID10.6
54
Conditional Image GenerationImageNet 256px 2012 (val)
FID7.8
50
Showing 10 of 18 rows

Other info

Follow for update