On the $\epsilon$-Free Inference Complexity of Absorbing Discrete Diffusion
About
Absorbing discrete diffusion has emerged as a dominant framework for discrete data generation. However, a significant disparity remains between its empirical success and theoretical understanding: existing analyses fail to demonstrate a complexity advantage over the $\mathcal{O}(d \ln(d/\epsilon))$ baseline established for \emph{uniform} discrete diffusion. We bridge this gap by identifying a critical structural advantage: whereas uniform diffusion redundantly re-denoises valid elements, the absorbing scheme denoises each absorbing state exactly once. Leveraging this insight, we introduce \emph{Absorbing-Aware Truncated Uniformization} (AATU). We prove that AATU achieves $\epsilon$-TV convergence with $\mathcal{O}(d \ln d)$ complexity-\emph{independent} of the error tolerance $\epsilon$-thereby strictly outperforming existing uniform baselines. Beyond improving convergence rates, our analysis eliminates the restrictive bounded-score assumption commonly required in prior studies of uniformization-based inference. Furthermore, we extend AATU to time-invariant parameterizations, showing that it naturally adopts an imputation-type inference with a uniformly randomized denoising order. When combined with a lazy update strategy, TV convergence requires only $\mathcal{O}(d)$ discrete score evaluations. These results not only establish a rigorous foundation for absorbing discrete diffusion -- confirming its efficiency in high-accuracy generation -- but also open new avenues for analyzing diffusion-based language models under the masking paradigm.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unconditional Text Generation | 32 Generated Samples (inference) | Average Perplexity31.82 | 6 | |
| Total Variation (TV) Convergence | Reverse particle SDEs | -- | 3 |