Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion

About

We present a discrete diffusion-based language model using Glauber dynamics from statistical physics. Our main insight is that instead of trying to train a discrete state space diffusion model using Glauber dynamics with a uniform transition kernel as the forward process, one can set up an ``energy function'' based on pretrained causal/masked language models. When viewed as the stationary distribution, this energy function allows us to significantly improve the quality of the generated text. Incorporating UL2 as the pretrained model into our diffusion pipeline, we outperform prior diffusion based LMs and perform competitively with autoregressive models of comparable model sizes. Furthermore, our models are competitive with or outperform prior diffusion models and GPT-2 style auto-regressive models on zero-shot common sense reasoning tasks as well as planning and search tasks like Sudoku and Zebra puzzles.

Tarun Kathuria, Sachin Kumar• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-103 (val)	PPL20.83	290
Commonsense Reasoning	SIQA	Accuracy39.1	183
Language Modeling	WikiText-2 (val)	Perplexity (BVS)20.35	179
Commonsense Reasoning	WinoGrande	Accuracy52.9	103
Common Sense Reasoning	PIQA	Accuracy68.9	100
Common Sense Reasoning	HellaSwag	Accuracy (acc_n)40.5	47
Language Modeling	LAMBADA (val)	Perplexity10.14	39
Sudoku	Sudoku	Accuracy91.82	19
Unconditional Generation	Unconditional Generations 1024	PPL (GPT-NEO)7.8	15
Language Modeling	One Billion Word Benchmark (val)	Perplexity44.12	11

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord