Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

dParallel: Learnable Parallel Decoding for dLLMs

About

Diffusion large language models (dLLMs) have recently drawn considerable attention within the research community as a promising alternative to autoregressive generation, offering parallel token prediction and lower inference latency. Yet, their parallel decoding potential remains largely underexplored, as existing open-source models still require nearly token-length decoding steps to ensure performance. To address this, we introduce dParallel, a simple and effective method that unlocks the inherent parallelism of dLLMs for fast sampling. We identify that the key bottleneck to parallel decoding arises from the sequential certainty convergence for masked tokens. Building on this insight, we introduce the core of our approach: certainty-forcing distillation, a novel training strategy that distills the model to follow its original sampling trajectories while enforcing it to achieve high certainty on masked tokens more rapidly and in parallel. Extensive experiments across various benchmarks demonstrate that our method can dramatically reduce the number of decoding steps while maintaining performance. When applied to the LLaDA-8B-Instruct model, dParallel reduces decoding steps from 256 to 30 on GSM8K, achieving an 8.5x speedup without performance degradation. On the MBPP benchmark, it cuts decoding steps from 256 to 24, resulting in a 10.5x speedup while maintaining accuracy. Our code is available at https://github.com/czg1225/dParallel

Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500 (test)
Accuracy52.6
514
Code GenerationHumanEval (test)--
506
Code GenerationMBPP (test)--
298
Radiology Report GenerationMIMIC-CXR (test)--
172
Code GenerationHumanEval
Accuracy54.3
99
Radiology Report GenerationCheXpert Plus (test)--
88
Code GenerationHumanEval
Acc31.71
65
Mathematical ReasoningGSM8K (test)
Accuracy0.7657
48
Mathematical ReasoningMATH--
42
Code GenerationHumanEval
TPS130.3
41
Showing 10 of 31 rows

Other info

Follow for update