Energy-Based Diffusion Language Models for Text Generation

About

Despite remarkable progress in autoregressive language models, alternative generative paradigms beyond left-to-right generation are still being actively explored. Discrete diffusion models, with the capacity for parallel generation, have recently emerged as a promising alternative. Unfortunately, these models still underperform the autoregressive counterparts, with the performance gap increasing when reducing the number of sampling steps. Our analysis reveals that this degradation is a consequence of an imperfect approximation used by diffusion models. In this work, we propose Energy-based Diffusion Language Model (EDLM), an energy-based model operating at the full sequence level for each diffusion step, introduced to improve the underlying approximation used by diffusion models. More specifically, we introduce an EBM in a residual form, and show that its parameters can be obtained by leveraging a pretrained autoregressive model or by finetuning a bidirectional transformer via noise contrastive estimation. We also propose an efficient generation algorithm via parallel important sampling. Comprehensive experiments on language modeling benchmarks show that our model can consistently outperform state-of-the-art diffusion models by a significant margin, and approaches autoregressive models' perplexity. We further show that, without any generation performance drop, our framework offers a 1.3$\times$ sampling speedup over existing diffusion models. Reproduced code is available at https://github.com/MinkaiXu/Energy-Diffusion-LLM.

Minkai Xu, Tomas Geffner, Karsten Kreis, Weili Nie, Yilun Xu, Jure Leskovec, Stefano Ermon, Arash Vahdat• 2024

Related benchmarks

Task	Dataset	Result
Language Modeling	PTB (test)	Perplexity89.67	543
Language modelling	LM1B (test)	Perplexity60.23	151
Language Modeling	arXiv (test)	PPL36.63	145
Language Modeling	LAMBADA (test)	Perplexity46.92	109
Language Modeling	PTB (val)	Perplexity89.73	107
Language Modeling	LM1B (val)	Perplexity60.23	67
Language Modeling	Wikitext (test)	Perplexity28.31	66
Language Modeling	WikiText (val)	Perplexity28.31	62
Language Modeling	OpenWebText (OWT) (val)	Perplexity17.58	42
Language Modeling	LAMBADA (val)	Perplexity46.92	39

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord