When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models
About
Diffusion language models decode text by iteratively denoising masked token sequences, making the choice of which positions to decode a central inference-time decision. Most training-free decoding strategies use model confidence for position selection, assuming that high-confidence positions are ready to be decoded. In this work, we revisit this assumption by studying when confidence misleads fully non-autoregressive (fully non-AR) decoding. EOT tokens can receive high confidence and cause incomplete generation; inserting a suffix anchor can mitigate this issue but introduces local overconfidence near the anchor, causing anchor-adjacent tokens to be decoded too early. To address these issues, we propose Suffix-Anchored Confidence Modulation, a simple training-free method that inserts a short suffix anchor to encourage response completion and modulates confidence near the anchor according to decoding progress. This preserves the response-completion benefit of suffix anchoring while reducing premature decoding of anchor-adjacent tokens. Across text-only reasoning, vision-language reasoning, and code-generation benchmarks, our method consistently improves confidence-based fully non-AR decoding, outperforms explicit EOT suppression, and preserves the parallel decoding advantage of fully non-AR generation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | pass@132.93 | 145 | |
| Math Reasoning | GSM8K | Accuracy (GSM8K)76.88 | 131 | |
| Knowledge Reasoning | MMLU-Pro | Accuracy49.4 | 120 | |
| Visual Question Answering | ChartQA (test) | Accuracy45.92 | 93 | |
| Commonsense Reasoning | StrategyQA | Accuracy (%)76.13 | 24 | |
| Code Generation | MBPP | Top-1 Acc.31.8 | 21 | |
| Math Reasoning | MATH 500 | Accuracy (MATH 500)30.8 | 14 | |
| Vision-Language Reasoning | MathVista (test) | Accuracy34.6 | 7 |