Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs

About

Diffusion Large Language Models (DLLMs) have emerged as a compelling alternative to Autoregressive models, designed for fast parallel generation. However, existing DLLMs are plagued by a severe quality-speed trade-off, where faster parallel decoding leads to significant performance degradation. We attribute this to the irreversibility of standard decoding in DLLMs, which is easily polarized into the wrong decoding direction along with early error context accumulation. To resolve this, we introduce Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable decoding in DLLMs. WINO employs a parallel draft-and-verify mechanism, aggressively drafting multiple tokens while simultaneously using the model's bidirectional context to verify and re-mask suspicious ones for refinement. Verified in open-source DLLMs like LLaDA and MMaDA, WINO is shown to decisively improve the quality-speed trade-off. For instance, on the GSM8K math benchmark, it accelerates inference by 6$\times$ while improving accuracy by 2.58%; on Flickr30K captioning, it achieves a 10$\times$ speedup with higher performance. More comprehensive experiments are conducted to demonstrate the superiority and provide an in-depth understanding of WINO.

Feng Hong, Geng Yu, Yushi Ye, Haicheng Huang, Huangjie Zheng, Ya Zhang, Yanfeng Wang, Jiangchao Yao• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Countdown	Accuracy33.2	252
Mathematical Reasoning	GSM8K	--	246
Reasoning	ARC	Accuracy85.31	245
Logical reasoning	Sudoku	Accuracy15.2	142
Code Generation	HumanEval	Accuracy (%)54.88	77
Mathematical Reasoning	MATH500	Accuracy44.4	32
Mathematical Reasoning	GSM8K	Accuracy82.03	32
Code Generation	MBPP	Accuracy57	32
Code Generation	HumanEval	Accuracy56.71	24
Symbolic Reasoning	Countdown	Accuracy25	24

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord