Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

About

It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.

Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans• 2026

Related benchmarks

TaskDatasetResultRank
Text GenerationOWT
GPT2 Perplexity17.2
41
Image GenerationCIFAR10
FID3.5
26
Unconditional Image GenerationCIFAR-10 (train)
FID3.5
24
Text GenerationOpen Web Text (OWT) (val)
GPT-2 GM Score0.456
19
Showing 4 of 4 rows

Other info

Follow for update