Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Auto-Regressive Masked Diffusion Models

About

Masked diffusion models (MDMs) have emerged as a promising approach for language modeling, yet they face a performance gap compared to autoregressive models (ARMs) and require more training iterations. In this work, we present the Auto-Regressive Masked Diffusion (ARMD) model, an architecture designed to close this gap by unifying the training efficiency of autoregressive models with the parallel generation capabilities of diffusion-based models. Our key insight is to reframe the masked diffusion process as a block-wise causal model. This perspective allows us to design a strictly causal, permutation-equivariant architecture that computes all conditional probabilities across multiple denoising steps in a single, parallel forward pass. The resulting architecture supports efficient, autoregressive-style decoding and a progressive permutation training scheme, allowing the model to learn both canonical left-to-right and random token orderings. Leveraging this flexibility, we introduce a novel strided parallel generation strategy that accelerates inference by generating tokens in parallel streams while maintaining global coherence. Empirical results demonstrate that ARMD achieves state-of-the-art performance on standard language modeling benchmarks, outperforming established diffusion baselines while requiring significantly fewer training steps. Furthermore, it establishes a new benchmark for parallel text generation, effectively bridging the performance gap between parallel and sequential decoding.

Mahdi Karami, Ali Ghodsi• 2026

Related benchmarks

TaskDatasetResultRank
Language modellingLM1B (test)
Perplexity22.36
120
Language ModelingLAMBADA zero-shot (test)--
44
Language ModelingWikiText-103 zero-shot (test)
PPL25.55
34
Language ModelingPTB zero-shot
Perplexity97.75
23
Language ModelingWikiText2 zero-shot
Perplexity26.06
13
Language Modeling1BW zero-shot
Perplexity43.91
13
Showing 6 of 6 rows

Other info

Follow for update