Scaling Beyond Masked Diffusion Language Models

About

Diffusion language models are a promising alternative to autoregressive models due to their potential for faster generation. Among discrete diffusion approaches, Masked diffusion currently dominates, largely driven by strong perplexity on language modeling benchmarks. In this work, we present the first scaling law study of uniform-state and interpolating discrete diffusion methods. We also show that Masked diffusion models can be made approximately 12% more FLOPs-efficient when trained with a simple cross-entropy objective. We find that perplexity is informative within a diffusion family but can be misleading across families, where models with worse likelihood scaling may be preferable due to faster and more practical sampling, as reflected by the speed-quality Pareto frontier. These results challenge the view that Masked diffusion is categorically the future of diffusion language modeling and that perplexity alone suffices for cross-algorithm comparison. Scaling all methods to 1.7B parameters, we show that uniform-state diffusion remains competitive on likelihood-based benchmarks and outperforms autoregressive and Masked diffusion models on GSM8K, despite worse validation perplexity. We provide the code, model checkpoints, and video tutorials on the project page: http://s-sahoo.github.io/scaling-dllms

Subham Sekhar Sahoo, Jean-Marie Lemercier, Zhihan Yang, Justin Deschenaux, Jingyu Liu, John Thickstun, Ante Jukic• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	PIQA	Accuracy62.7	757
Question Answering	ARC-E	Accuracy53.4	544
Question Answering	OBQA	Accuracy33	347
Question Answering	BoolQ	Accuracy62.8	317
Commonsense Reasoning	SIQA	Accuracy39.2	183
Social Interaction Question Answering	SIQA	Accuracy41.9	157
Multiple-choice Question Answering	OBQA	Accuracy40.4	79
Reading Comprehension	RACE	Accuracy35	75
Multiple-choice Question Answering	PIQA	Accuracy78.1	67
Multiple-choice Question Answering	RACE	Accuracy36.2	64

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord