On the $\epsilon$-Free Inference Complexity of Absorbing Discrete Diffusion

About

Absorbing discrete diffusion has emerged as a dominant framework for discrete data generation. However, a significant disparity remains between its empirical success and theoretical understanding: existing analyses fail to demonstrate a complexity advantage over the $\mathcal{O}(d \ln(d/\epsilon))$ baseline established for \emph{uniform} discrete diffusion. We bridge this gap by identifying a critical structural advantage: whereas uniform diffusion redundantly re-denoises valid elements, the absorbing scheme denoises each absorbing state exactly once. Leveraging this insight, we introduce \emph{Absorbing-Aware Truncated Uniformization} (AATU). We prove that AATU achieves $\epsilon$-TV convergence with $\mathcal{O}(d \ln d)$ complexity-\emph{independent} of the error tolerance $\epsilon$-thereby strictly outperforming existing uniform baselines. Beyond improving convergence rates, our analysis eliminates the restrictive bounded-score assumption commonly required in prior studies of uniformization-based inference. Furthermore, we extend AATU to time-invariant parameterizations, showing that it naturally adopts an imputation-type inference with a uniformly randomized denoising order. When combined with a lazy update strategy, TV convergence requires only $\mathcal{O}(d)$ discrete score evaluations. These results not only establish a rigorous foundation for absorbing discrete diffusion -- confirming its efficiency in high-accuracy generation -- but also open new avenues for analyzing diffusion-based language models under the masking paradigm.

Xunpeng Huang, Yingyu Lin, Nishant Jain, Kaibo Wang, Difan Zou, Yian Ma, Tong Zhang• 2025

Related benchmarks

Task	Dataset	Result	Rank
Unconditional Text Generation	32 Generated Samples (inference)	Average Perplexity31.82		6
Total Variation (TV) Convergence	Reverse particle SDEs	--		3

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord