KLASS: KL-Guided Fast Inference in Masked Diffusion Models

About

Masked diffusion models have demonstrated competitive results on various tasks including language generation. However, due to its iterative refinement process, the inference is often bottlenecked by slow and static sampling speed. To overcome this problem, we introduce `KL-Adaptive Stability Sampling' (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions. By unmasking multiple tokens in each iteration without any additional model training, our approach speeds up generation significantly while maintaining sample quality. On reasoning benchmarks, KLASS achieves up to $2.78\times$ wall-clock speedups while improving performance over standard greedy decoding, attaining state-of-the-art results among diffusion-based samplers. We further validate KLASS across diverse domains, including text, image, and molecular generation, showing its effectiveness as a broadly applicable sampler across different models.

Seo Hyun Kim, Sunwoo Hong, Hojung Jung, Youngrok Park, Se-Young Yun• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	--	1896
Mathematical Reasoning	GSM8K	Accuracy78.2	1398
Code Generation	HumanEval	Pass@159.8	1043
Mathematical Reasoning	GSM8K (test)	Accuracy55.4	954
Reasoning	BBH	--	726
Mathematical Reasoning	GSM8K	Speed Up (x)2.58	246
Code Generation	HumanEval	Accuracy54.88	217
Code Generation	MBPP	Accuracy40.6	165
Planning	Sudoku	Accuracy82.1	129
Planning	Countdown	Accuracy35.4	89

Showing 10 of 56 rows

Other info

GitHub

Follow for update

@wizwand_team Discord