Fast-dLLM v2: Efficient Block-Diffusion LLM

About

Autoregressive (AR) large language models (LLMs) have achieved remarkable performance across a wide range of natural language tasks, yet their inherent sequential decoding limits inference efficiency. In this work, we propose Fast-dLLM v2, a carefully designed block diffusion language model (dLLM) that efficiently adapts pretrained AR models into dLLMs for parallel text generation, requiring only approximately 1B tokens of fine-tuning. This represents a 500x reduction in training data compared to full-attention diffusion LLMs such as Dream (580B tokens), while preserving the original model's performance. Our approach introduces a novel training recipe that combines a block diffusion mechanism with a complementary attention mask, enabling blockwise bidirectional context modeling without sacrificing AR training objectives. To further accelerate decoding, we design a hierarchical caching mechanism: a block-level cache that stores historical context representations across blocks, and a sub-block cache that enables efficient parallel generation within partially decoded blocks. Coupled with our parallel decoding pipeline, Fast-dLLM v2 achieves up to 2.5x speedup over standard AR decoding without compromising generation quality. Extensive experiments across diverse benchmarks demonstrate that Fast-dLLM v2 matches or surpasses AR baselines in accuracy, while delivering state-of-the-art efficiency among dLLMs - marking a significant step toward the practical deployment of fast and accurate LLMs. Code and model will be publicly released.

Chengyue Wu, Hao Zhang, Shuchen Xue, Shizhe Diao, Yonggan Fu, Zhijian Liu, Pavlo Molchanov, Ping Luo, Song Han, Enze Xie• 2025

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	--	836
Code Generation	HumanEval (test)	Pass@182.3	612
Code Generation	MBPP (test)	--	405
Mathematical Reasoning	GSM8K	Speed Up (x)3.1	246
Code Generation	HumanEval	Accuracy61.7	217
Mathematical Reasoning	GSM8K	--	204
Code Generation	MBPP	Pass@178.2	193
Mathematical Reasoning	MATH500	Accuracy (ACC)59.4	133
Mathematical Reasoning	GSM8K	Accuracy83.7	95
Code Reasoning	LiveCodeBench	Accuracy6.8	90

Showing 10 of 33 rows

Other info

Follow for update

@wizwand_team Discord