Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adaptive Order Policies for Masked Diffusion

About

Masked diffusion models have seen great success in capturing data distributions over discrete sequences in domains such as text and proteins. These models generate data by iteratively unmasking tokens starting from a fully masked sequence, with the unmasking order typically chosen at random or using a heuristic based on denoiser probabilities. In this work, we propose a scheme for learning the unmasking order using an additional lightweight policy network on top of a diffusion model. Our proposed loss reweights terms in the masked diffusion loss according to policy probabilities, and results in a policy that prefers positions where the denoiser is more likely to be correct. We study this loss in two settings: (i) training solely the policy while using a frozen pre-trained denoiser, and (ii) training the policy and denoiser jointly with the weighted loss to allow for mutual adaptation. We demonstrate that our approach outperforms common heuristics on problems that are sensitive to token ordering, such as combinatorial tasks and proteins.

Jama Hussein Mohamud, Mohsin Hasan, Mirco Ravanelli, Yoshua Bengio• 2026

Related benchmarks

TaskDatasetResultRank
Protein Sequence GenerationProtein sequences lengths 200-800
pLDDT86.43
10
Combinatorial Reasoning3-SAT
Accuracy90.9
8
Combinatorial ReasoningSudoku
Accuracy92.87
8
Protein Sequence GenerationDPLM 150M
Mean pLDDT84.94
7
Showing 4 of 4 rows

Other info

Follow for update