Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference

About

Speculative decoding is an effective and lossless approach for accelerating LLM inference. However, existing widely adopted model-based draft designs, such as EAGLE3, improve accuracy at the cost of multi-step autoregressive inference, resulting in high drafting latency and ultimately rendering the drafting stage itself a performance bottleneck. Inspired by diffusion-based large language models (dLLMs), we propose DART, which leverages parallel generation to reduce drafting latency. DART predicts logits for multiple future masked positions in parallel within a single forward pass based on hidden states of the target model, thereby eliminating autoregressive rollouts in the draft model while preserving a lightweight design. Based on these parallel logit predictions, we further introduce an efficient tree pruning algorithm that constructs high-quality draft token trees with N-gram-enforced semantic continuity. DART substantially reduces draft-stage overhead while preserving high draft accuracy, leading to significantly improved end-to-end decoding speed. Experimental results demonstrate that DART achieves a 2.03x--3.44x wall-clock time speedup across multiple datasets, surpassing EAGLE3 by 30% on average and offering a practical speculative decoding framework. Code is released at https://github.com/fvliang/DART.

Fuliang Liu, Xue Li, Ketai Zhao, Yinxi Gao, Ziyan Zhou, Zhonghui Zhang, Zhibin Wang, Wanchun Dou, Sheng Zhong, Chen Tian• 2026

Related benchmarks

TaskDatasetResultRank
Inference EfficiencyHumanEval
Speedup Factor3.25
54
Code GenerationCodeAlpaca
Average Speed-up3.45
41
Generative InferenceMT-Bench
Speedup2.73
26
LLM InferenceAlpaca
Speedup2.95
21
LLM InferenceLiveCodeBench
Speedup2.81
21
LLM InferenceMATH500
Speedup2.84
21
LLM InferenceMBPP
Speedup3.09
21
LLM InferenceAggregate Mean over Alpaca, CodeAlpaca, HumanEval, LiveCodeBench, Math500, MBPP, MT-Bench
Mean Speedup2.87
21
LLM InferenceAlpaca, CodeAlpaca, HumanEval, LiveCodeBench, Math500, MBPP, and MT-Bench
Speedup (Alpaca)2.61
8
Showing 9 of 9 rows

Other info

Follow for update