Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Diffusion Duality

About

Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code, model checkpoints, and video tutorials on the project page: http://s-sahoo.github.io/duo

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov• 2025

Related benchmarks

TaskDatasetResultRank
Language modellingLM1B (test)
Perplexity22.3
120
Image GenerationMNIST Binary (test)
FID6.52
98
Image GenerationCIFAR-10
FID69.87
88
Molecular GenerationZINC250K
Uniqueness942.2
68
Molecule GenerationZINC 250k 2012
Validity Score942.2
56
Molecule GenerationQM9 2014
Novelty Score186.2
56
Molecule GenerationQM9 2014 (test)
Uniqueness987.2
56
Unconditional Text GenerationOpenWebText
Gen. PPL46.31
56
Multiple-choice Question AnsweringARC Easy (test)
Accuracy44.95
50
Multiple-choice Question AnsweringARC Challenge (test)
Accuracy25.43
26
Showing 10 of 28 rows

Other info

Follow for update