Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling

About

Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward desired rewards without retraining the models. Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement. We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity under reward optimization. PG-DLM constructs a Markov chain over full denoising trajectories and applies a conditional sequential Monte Carlo kernel to resample them. We derive theoretical guarantees for convergence, including asymptotic consistency and variance bounds. Within this framework, we further analyze trade-offs across four key axes for inference-time scaling under fixed budgets: iterations, samples, denoising steps, and reward estimation. Our analysis shows scaling iterations achieves the best reward-perplexity trade-off. Empirically, PG-DLM consistently outperforms prior methods using MDLM and LLaDA-8B as base models across a wide range of compute budgets for reward-guided generation tasks including toxicity and sentiment control as well as linguistic acceptability.

Meihua Dang, Jiaqi Han, Minkai Xu, Kai Xu, Akash Srivastava, Stefano Ermon• 2025

Related benchmarks

TaskDatasetResultRank
Sentiment Steering15 prefix prompts length 50
Sentiment Accuracy69.4
11
Toxicity Steering15 prefix prompts length 50
Toxicity Accuracy9
11
Sentiment SteeringMDLM long sequence generation 512 length (test)
Steering Accuracy26
6
Toxicity SteeringMDLM long sequence generation 512 length (test)
Steering Accuracy3
6
Showing 4 of 4 rows

Other info

Follow for update