Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Partition Generative Modeling: Masked Modeling Without Masks

About

Masked generative models (MGMs) can generate tokens in parallel and in any order, unlike autoregressive models (ARMs), which decode one token at a time, left-to-right. However, MGMs process the full-length sequence at every sampling step, including mask tokens that carry no information. In contrast, ARMs process only the previously generated tokens. We introduce ``Partition Generative Models'' (PGMs), which replace masking with partitioning. Tokens are split into two groups that cannot attend to each other, and the model learns to predict each group conditioned on the other, eliminating mask tokens entirely. Because the groups do not interact, PGMs can process only the clean tokens during sampling, like ARMs, while retaining parallel, any-order generation, like MGMs. On OpenWebText, PGMs achieve $5-5.5\times$ higher throughput than MDLM while producing samples with lower Generative Perplexity. On ImageNet, PGMs reach comparable FID to MaskGIT with a $7.5\times$ throughput improvement. With twice as many steps, the FID improves to 4.56 while remaining $3.9\times$ faster than MGMs. Finally, PGMs remain compatible with existing MGM samplers and distillation methods.

Justin Deschenaux, Lan Tran, Caglar Gulcehre• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy33.92
1460
Commonsense ReasoningWinoGrande
Accuracy54.3
776
Question AnsweringARC Challenge
Accuracy24.06
749
Commonsense ReasoningPIQA
Accuracy61.43
647
Question AnsweringARC Easy
Accuracy38.8
386
Language ModelingLAMBADA
Accuracy46.98
183
Mathematical ReasoningMathQA
Accuracy21.71
95
Word PredictionLAMBADA (test)
Accuracy47.22
53
Question AnsweringBoolQ (test)
Accuracy53.49
46
Commonsense ReasoningPIQA (test)
Accuracy59.85
46
Showing 10 of 18 rows

Other info

Follow for update