Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards

About

Diffusion-based large language models offer a non-autoregressive alternative for text generation, but enabling them to perform complex reasoning remains challenging. Reinforcement learning has recently emerged as an effective post-training strategy for improving their performance; however, existing methods rely primarily on outcome-based rewards, which provide no direct supervision over the denoising process and often result in poorly structured reasoning that is difficult to interpret and inconsistently supports the final prediction. To address this limitation, we introduce \emph{denoising process reward}, a process-level reinforcement signal defined over the denoising trajectory of diffusion language models. This reward is obtained by estimating the contribution of intermediate denoising intervals to the final task outcome, encouraging the model to favor reasoning trajectories that consistently guide generation toward correct predictions. We further propose an efficient stochastic estimator that reuses standard training rollouts, enabling practical process-level supervision at scale. Experiments on challenging reasoning benchmarks demonstrate that our approach yields consistent improvements in reasoning stability, interpretability, and overall task performance.

Shaoan Xie, Lingjing Kong, Xiangchen Song, Xinshuai Dong, Guangyi Chen, Eric P.Xing, Kun Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningCountdown
Accuracy56.3
168
Logical reasoningSudoku
Accuracy22.4
119
Commonsense ReasoningARC
Accuracy93
28
Advanced Mathematical ReasoningMath500 256 tokens
Pass@1 Accuracy38.6
15
Grade School Math Word ProblemsGSM8k 256 tokens
Pass@180.6
15
Grade School Math Word ProblemsGSM8k 512 tokens
Pass@182.1
15
Arithmetic ReasoningCountdown 512 tokens
Pass@156.3
15
Arithmetic ReasoningCountdown 256 tokens
Pass@154.2
15
Advanced Mathematical ReasoningMath500 512 tokens
Pass@1 Accuracy40.4
15
Sudoku SolvingSudoku 256 tokens
Pass@121.2
15
Showing 10 of 11 rows

Other info

Follow for update