Balancing Understanding and Generation in Discrete Diffusion Models

About

In discrete generative modeling, two dominant paradigms demonstrate divergent capabilities: Masked Diffusion Language Models (MDLM) excel at semantic understanding and zero-shot generalization, whereas Uniform-noise Diffusion Language Models (UDLM) achieve strong few-step generation quality, yet neither attains balanced performance across both dimensions. To address this, we propose XDLM, which bridges the two paradigms via a stationary noise kernel. XDLM offers two key contributions: (1) it provides a principled theoretical unification of MDLM and UDLM, recovering each paradigm as a special case; and (2) an alleviated memory bottleneck enabled by an algebraic simplification of the posterior probabilities. Experiments demonstrate that XDLM advances the Pareto frontier between understanding capability and generation quality. Quantitatively, XDLM surpasses UDLM by 5.4 points on zero-shot text benchmarks and outperforms MDLM in few-step image generation (FID 54.1 vs. 80.8). When scaled to tune an 8B-parameter large language model, XDLM achieves 15.0 MBPP in just 32 steps, effectively doubling the baseline performance. Finally, analysis of training dynamics reveals XDLM's superior potential for long-term scaling. Code is available at https://github.com/MzeroMiko/XDLM

Yue Liu, Yuzhong Zhao, Zheyong Xie, Qixiang Ye, Jianbin Jiao, Yao Hu, Shaosheng Cao, Yunfan Liu• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	PTB	Perplexity90.796	1034
Language Modeling	WikiText	PPL32.748	732
Reasoning	BBH	--	672
Mathematical Reasoning	MATH	--	338
Language Modeling	LAMBADA	Perplexity45.608	150
Image Generation	ImageNet-1k (val)	FID25.774	93
Image Generation	ImageNet-1K	FID8.625	55
Language Modeling	arXiv	Perplexity37.232	55
Code Generation	HumanEval	HumanEval Score31.71	50
Language Modeling	Pubmed	Perplexity41.391	38

Showing 10 of 12 rows

Other info

GitHub

Follow for update

@wizwand_team Discord