When to Commit? Towards Variable-Size Self-Contained Blocks for Discrete Diffusion Language Models
About
Discrete diffusion language models (dLLMs) enable parallel token updates with bidirectional attention, yet practical generation typically adopts blockwise semi-autoregressive decoding. This switch creates a training-inference mismatch: training denoises with full-sequence context, while inference commits tokens within a bounded block without future context. Therefore, decoding with fixed-size or heuristic-based blocks can lead to premature token commitments, as decisions are made without full access to future context that could alter those choices. Motivated by this, we propose self-containedness as a principled criterion for block commitment. A block is self-contained if its predictions remain consistent with Future-Aware (FA) or without No-Future (NF) access to future context, reframing block boundary selection as a test of self-containedness rather than a heuristic choice. Based on this principle, we introduce Variable-size Self-contained Blocks (VSB) for dLLMs. VSB scores and selects block boundaries using the divergence between token-level predictive distributions under NF and FA conditioning, which quantifies how predictions would change if future context were revealed. We provide theoretical justification linking self-containedness to predictive consistency, and extensive experiments validate VSB's efficacy over fixed-size and heuristic blockwise decoding.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Coding | HumanEval | Pass@146.95 | 168 | |
| Mathematics | MATH 500 | Pass@139.6 | 122 | |
| Code | MBPP | Pass@139.8 | 73 | |
| General Knowledge | HellaSwag | Accuracy76.93 | 36 | |
| General Knowledge | MMLU | pass@163.11 | 31 | |
| Math & Science | GSM8K | Pass@1 Accuracy83.4 | 9 | |
| Math & Science | GPQA Diamond | Accuracy (Pass@1)28.79 | 9 |