Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion

About

We present a discrete diffusion-based language model using Glauber dynamics from statistical physics. Our main insight is that instead of trying to train a discrete state space diffusion model using Glauber dynamics with a uniform transition kernel as the forward process, one can set up an ``energy function'' based on pretrained causal/masked language models. When viewed as the stationary distribution, this energy function allows us to significantly improve the quality of the generated text. Incorporating UL2 as the pretrained model into our diffusion pipeline, we outperform prior diffusion based LMs and perform competitively with autoregressive models of comparable model sizes. Furthermore, our models are competitive with or outperform prior diffusion models and GPT-2 style auto-regressive models on zero-shot common sense reasoning tasks as well as planning and search tasks like Sudoku and Zebra puzzles.

Tarun Kathuria, Sachin Kumar• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-103 (val)
PPL20.83
261
Commonsense ReasoningSIQA
Accuracy39.1
168
Commonsense ReasoningWinoGrande
Accuracy52.9
103
Common Sense ReasoningPIQA
Accuracy68.9
100
Language ModelingWikiText-2 (val)
Perplexity (BVS)20.35
70
Common Sense ReasoningHellaSwag
Accuracy (acc_n)40.5
47
Language ModelingLAMBADA (val)
Perplexity10.14
39
SudokuSudoku
Accuracy91.82
19
Unconditional GenerationUnconditional Generations 1024
PPL (GPT-NEO)7.8
15
Language ModelingOne Billion Word Benchmark (val)
Perplexity44.12
11
Showing 10 of 12 rows

Other info

Follow for update