Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Esoteric Language Models: A Family of Any-Order Diffusion LLMs

About

Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Within this family, Masked Diffusion Models (MDMs) currently perform best but still underperform AR models in perplexity and lack key inference-time efficiency features, most notably KV caching. We introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, smoothly interpolating between their perplexities while overcoming their respective limitations. Unlike prior work, which uses transformers with bidirectional attention as MDM denoisers, we exploit the connection between MDMs and Any-Order autoregressive models and adopt causal attention. This design lets us compute the exact likelihood of MDMs for the first time and, crucially, enables us to introduce KV caching for MDMs while preserving parallel generation for the first time, significantly improving inference efficiency. Combined with an optimized sampling schedule, Eso-LMs establish a new state of the art on the speed-quality Pareto frontier for unconditional generation. We provide the code, model checkpoints, and the video tutorial on the project page: https://s-sahoo.com/Eso-LMs.

Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingOpenWebText
Perplexity23.1
122
Language ModelingPTB (val)
Perplexity97.46
107
Language ModelingOWT (test)
Perplexity (PPL)18.46
79
Language ModelingLM1B (val)
Perplexity60.11
67
Language ModelingWikiText (val)
Perplexity35.65
62
Language ModelingLM1B
Perplexity30.8
39
Language ModelingLAMBADA (val)
Perplexity57.33
39
Language ModelingAG News (val)
Perplexity65.26
36
Language ModelingArXiv (val)
Perplexity53.78
34
Language ModelingLM1B L=128 (test)
NELBO PPL24.53
17
Showing 10 of 15 rows

Other info

Follow for update