Accelerating Speculative Decoding with Block Diffusion Draft Trees

About

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art speculative decoding performance, outperforming strong autoregressive drafters such as EAGLE-3. Vanilla DFlash, however, still verifies only a single drafted trajectory per round, potentially limiting its acceptance length. We introduce DDTree (Diffusion Draft Tree), a method that constructs a draft tree directly from the per-position distributions of a block diffusion drafter. Under a fixed node budget, DDTree uses a simple best-first heap algorithm to select the continuations that are most likely to match the target model according to a surrogate defined by the draft model's output. The resulting tree is verified efficiently in a single target model forward pass using an ancestor-only attention mask. Because DDTree builds on DFlash, a leading draft model for speculative decoding, these gains place DDTree among the leading approaches to speculative decoding.

Liran Ringel, Yaniv Romano• 2026

Related benchmarks

Task	Dataset	Result
Instruction Following	Alpaca	Speedup (x)3.36	173
Code Generation	HumanEval	Speedup Factor8.22	147
Mathematical Reasoning	GSM8K	--	108
Speculative Decoding	GSM8K	Average Generation Length (τ)9.27	81
Code Generation	MBPP	Speedup7.68	79
Speculative Decoding	LiveCodeBench	Speedup Factor5.42	66
Speculative Decoding	MT-Bench	Tau (τ)6.06	53
Speculative Decoding	HumanEval	Tau (τ)9.65	36
Software Engineering	SWE-Bench Lite	Speedup4.38	36
Code Generation	HumanEval	TPS (Tokens/s)9.32	31

Showing 10 of 33 rows

Other info

GitHub

Follow for update

@wizwand_team Discord