Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

About

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive (AR) LLMs for text generation, with the potential to decode multiple tokens in a single iteration. However, none of the existing open-source dLLMs have achieved superior inference speed over AR LLMs of similar size. This paper breaks this barrier based on a simple and effective strategy named discrete diffusion forcing (D2F). D2F equips dLLMs with two key capabilities: (1) block-wise autoregressive generation to enable KV cache utilization; (2) prediction of following tokens without requiring completion of prior blocks for inter-block parallel decoding. In this way, the vanilla dLLMs are refurbished into an AR-diffusion hybrid paradigm for efficient inference. D2F can be implemented with an asymmetric distillation process based on pre-trained dLLMs. We further propose a pipelined parallel decoding algorithm, which enables a trade-off between efficiency and efficacy. Empirically, D2F dLLMs achieve more than $\mathbf{2.5\times}$ inference speed than LLaMA3 and Qwen2.5 on GSM8K. Compared to vanilla dLLMs like LLaDA and Dream, the acceleration can be more than $\mathbf{50\times}$ while maintaining comparable output quality. The code is available at https://github.com/zhijie-group/Discrete-Diffusion-Forcing.

Xu Wang, Chenkai Xu, Yijie Jin, Jiachun Jin, Hao Zhang, Zhijie Deng• 2025

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval--
850
Mathematical ReasoningMATH
Accuracy38.62
535
Mathematical ReasoningGSM8K--
177
Code GenerationMBPP
Accuracy (%)53.48
146
Mathematical ReasoningGSM8K
TPS89.82
26
Code GenerationMBPP
TPF230
9
Mathematical ReasoningMATH (test)
Tokens Per Frame (TPF)2.6
9
Code GenerationHumanEval
TPF2.5
9
Mathematical ReasoningGSM8K (test)
TPF (Time Per First Token)3.1
9
Code GenerationMBPP
TPF3.13
6
Showing 10 of 12 rows

Other info

Follow for update