Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling

About

Diffusion language models intrinsically fail to capture correlations between decoded tokens, which leads to a harsh trade-off between sampling quality and throughput. To solve this issue, we propose DiLaDiff, a variant of masked diffusion language models with three components: (1) a continuous latent space with semantic capabilities, learned by an auto-encoder fine-tuned from an existing masked diffusion language model; (2) a latent diffusion model learning the prior over the encoder distribution; (3) a consistency model distilling the learned prior into a few-step latent generative model. We show that, even without distillation, our latent-guided diffusion model outperforms the masked diffusion baseline while significantly accelerating inference. Consistency distillation further lowers the computational overhead of continuous diffusion, such that the latent is generated in negligible time compared to discrete decoding.

Jean-Marie Lemercier, Tomas Geffner, Karsten Kreis, Morteza Mardani, Arash Vahdat, Ante Juki\'c• 2026

Related benchmarks

TaskDatasetResultRank
Text GenerationOpenWebText
Perplexity47.8
142
Text GenerationOpenWebText (test)
Average Perplexity76.3
13
Text ReconstructionOpenWebText (val)
Perplexity (PPL)2.15
9
Language GenerationOpenWebText (test)
GenPPL (Oracle)31.4
9
Showing 4 of 4 rows

Other info

Follow for update