Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Di$\mathtt{[M]}$O: Distilling Masked Diffusion Models into One-step Generator

About

Masked Diffusion Models (MDMs) have emerged as a powerful generative modeling technique. Despite their remarkable results, they typically suffer from slow inference with several steps. In this paper, we propose Di$\mathtt{[M]}$O, a novel approach that distills masked diffusion models into a one-step generator. Di$\mathtt{[M]}$O addresses two key challenges: (1) the intractability of using intermediate-step information for one-step generation, which we solve through token-level distribution matching that optimizes model output logits by an 'on-policy framework' with the help of an auxiliary model; and (2) the lack of entropy in the initial distribution, which we address through a token initialization strategy that injects randomness while maintaining similarity to teacher training distribution. We show Di$\mathtt{[M]}$O's effectiveness on both class-conditional and text-conditional image generation, impressively achieving performance competitive to multi-step teacher outputs while drastically reducing inference time. To our knowledge, we are the first to successfully achieve one-step distillation of masked diffusion models and the first to apply discrete distillation to text-to-image generation, opening new paths for efficient generative modeling.

Yuanzhi Zhu, Xi Wang, St\'ephane Lathuili\`ere, Vicky Kalogeiton• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score43
704
Class-conditional Image GenerationImageNet 256x256 (val)
Inception Score (IS)310.1
493
Text-to-Image GenerationMS-COCO
FID24.15
145
Text-to-Image GenerationHPS v2.1
Overall Score28.59
96
Class-conditional Image GenerationImageNet class-conditional 256x256
Inception Score (IS)214
61
Class-conditional Image GenerationImageNet 256
FID6.91
28
Class-conditional Image GenerationImageNet 256
FID2.89
20
Showing 7 of 7 rows

Other info

Follow for update