The Diffusion Duality, Chapter II: $\Psi$-Samplers

About

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2

Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo• 2026

Related benchmarks

Task	Dataset	Result
Language modelling	LM1B (test)	Perplexity30	151
Multiple-choice Question Answering	ARC Easy (test)	Accuracy28.28	68
Multiple-choice Question Answering	ARC Challenge (test)	Accuracy26.11	57
Unconditional Generation	LM1B sequence length 128	Generation Perplexity (PPL)56.2	43
Language Modeling	OpenWebText (OWT) (val)	Perplexity25.2	42
Question Answering	MathQA (test)	Accuracy21.01	41
Unconditional Generation	OpenWebText L=1024 (test)	Generation Perplexity18.3	40
Multiple-choice Question Answering	PIQA (test)	Accuracy (PIQA)52.12	27
Language Modeling	PTB zero-shot	Perplexity91.94	25
Language Modeling	Pubmed zero-shot	Perplexity43.98	20

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord