Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generalized Interpolating Discrete Diffusion

About

While state-of-the-art language models achieve impressive results through next-token prediction, they have inherent limitations such as the inability to revise already generated tokens. This has prompted exploration of alternative approaches such as discrete diffusion. However, masked diffusion, which has emerged as a popular choice due to its simplicity and effectiveness, reintroduces this inability to revise words. To overcome this, we generalize masked diffusion, deriving a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes. Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. Code: https://github.com/dvruette/gidd/

Dimitri von R\"utte, Janis Fluri, Yuhui Ding, Antonio Orvieto, Bernhard Sch\"olkopf, Thomas Hofmann• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingPTB
Perplexity86.911
1234
Language ModelingWikiText
PPL30.809
740
Question AnsweringPIQA
Accuracy52.83
505
Question AnsweringOBQA
Accuracy26.6
347
Language ModelingLAMBADA
Perplexity47.811
198
Text GenerationOpenWebText
Perplexity249.8
142
Image GenerationImageNet-1k (val)
FID35.403
106
Language ModelingLM1B
PPL (Generalized)118.6
93
Language ModelingOWT
Gen. PPL63.8
78
Language ModelingLM1B (val)
Perplexity32.98
67
Showing 10 of 22 rows

Other info

Follow for update