Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

About

Discrete diffusion models offer a promising alternative to autoregressive generation through parallel decoding, but they suffer from a sampling wall: once categorical sampling occurs, rich distributional information collapses into one-hot vectors and cannot be propagated across steps, forcing subsequent steps to operate with limited information. To mitigate this problem, we introduce Loopholing, a novel and simple mechanism that preserves this information via a deterministic latent pathway, leading to Loopholing Discrete Diffusion Models (LDDMs). Trained efficiently with a self-conditioning strategy that avoids unrolling the full denoising trajectory, LDDMs achieve substantial gains-reducing generative perplexity by up to 61% over prior baselines, thereby closing (and in some cases surpassing) the gap with autoregressive models, and producing more coherent text. Applied to reasoning tasks, LDDMs also improve performance on arithmetic benchmarks such as Countdown and Game of 24. These results also indicate that loopholing mitigates idle steps and oscillations, providing a general and effective path toward high-quality non-autoregressive text generation.

Mingyu Jo, Jaesik Yoon, Justin Deschenaux, Caglar Gulcehre, Sungjin Ahn• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingPTB (test)
Perplexity71.52
526
Question AnsweringPIQA
Accuracy58.16
374
Language ModelingLAMBADA
Accuracy52.4
268
Multiple-choice Question AnsweringARC Easy
Accuracy36.03
188
Language ModelingarXiv (test)
PPL34.96
145
Language modellingLM1B (test)
Perplexity69.53
130
Language ModelingOne Billion Word Benchmark (test)
Test Perplexity25.95
113
Multiple-choice Question AnsweringHellaSwag
Accuracy33.11
93
Language ModelingLAMBADA (test)--
71
Language ModelingWikitext (test)
Perplexity33.27
62
Showing 10 of 20 rows

Other info

Follow for update