Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cluster-Level Attention-Guided Parallel Decoding for Masked Diffusion Language Models

About

Masked diffusion language models (MDLMs) enable parallel decoding by predicting all masked positions at each denoising step, yet existing training-free samplers usually decide which positions to commit at token-level granularity. We revisit this granularity and observe that reliable predictions often emerge as contiguous high-confidence spans, suggesting that the unit of parallel commitment can be larger than a single token. We first group adjacent high-confidence candidates into confidence-induced clusters (CICs) as span-level update units. We then use self-attention maps from the same forward pass to estimate inter-cluster dependencies, enabling conflict-aware selection of mutually compatible CICs for parallel commitment. This yields CLAD (Cluster-Level Attention-Guided Decoding), a training-free cluster-level decoder for MDLMs. Experiments on LLaDA and Dream model families across four reasoning and code-generation benchmarks show that CLAD achieves 1.77x--8.47x speedups over Vanilla decoding while maintaining broadly comparable task accuracy in most settings.

Heqiang Qi, Wei Huang, Mingyuan Bai, Xiangming Meng• 2026

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval 0-shot
Accuracy54.27
69
Mathematical ReasoningGSM8k 5-shot
Accuracy81.65
54
Math ReasoningMATH 4-shot
Accuracy37.34
33
Code GenerationMBPP 3-shot
Accuracy54.2
33
ReasoningGSM8k 5-shot
Accuracy77.79
12
Showing 5 of 5 rows

Other info

Follow for update