Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement

About

Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward desired rewards without retraining the models. Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement. We introduce particle Gibbs sampling for diffusion language models (PG-DLM), an inference-time algorithm enabling trajectory-level refinement. PG-DLM constructs a Markov chain over full denoising trajectories and applies a conditional sequential Monte Carlo kernel to resample them. By doing so, PG-DLM introduces a new scaling axis, the number of refinement iterations, which is unavailable to prior methods. Increasing iterations remains effective even as gains from adding more parallel samples saturate. Furthermore, PG-DLM enables adaptive compute allocation by performing additional iterations only when needed, leading to further efficiency gains. We derive theoretical guarantees for convergence and variance bounds, and analyze trade-offs across different scaling axes. Empirically, PG-DLM outperforms prior methods across compute budgets on reward-guided generation tasks. On GSM8K, it achieves 90.07% accuracy with 2.9 particles on average and 94.47% accuracy with 16 particles.

Meihua Dang, Jiaqi Han, Minkai Xu, Kai Xu, Akash Srivastava, Stefano Ermon• 2025

Related benchmarks

TaskDatasetResultRank
Sentiment Steering15 prefix prompts length 50
Sentiment Accuracy69.4
11
Toxicity Steering15 prefix prompts length 50
Toxicity Accuracy9
11
Sentiment SteeringMDLM long sequence generation 512 length (test)
Steering Accuracy26
6
Toxicity SteeringMDLM long sequence generation 512 length (test)
Steering Accuracy3
6
Showing 4 of 4 rows

Other info

Follow for update