Generalized Interpolating Discrete Diffusion

About

While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion, deriving a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Code: https://github.com/dvruette/gidd/

Dimitri von R\"utte, Janis Fluri, Yuhui Ding, Antonio Orvieto, Bernhard Sch\"olkopf, Thomas Hofmann• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	PTB	Perplexity86.911	1234
Language Modeling	WikiText	PPL30.809	740
Question Answering	PIQA	Accuracy52.83	505
Question Answering	OBQA	Accuracy26.6	347
Language Modeling	LAMBADA	Perplexity47.811	198
Text Generation	OpenWebText	Perplexity249.8	142
Image Generation	ImageNet-1k (val)	FID35.403	106
Language Modeling	LM1B	PPL (Generalized)118.6	93
Language Modeling	OWT	Gen. PPL63.8	78
Language Modeling	LM1B (val)	Perplexity32.98	67

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord