Improving Sampling for Masked Diffusion Models via Information Gain

About

Masked Diffusion Models (MDMs) enable flexible decoding orders, yet existing samplers remain largely greedy, selecting locally certain tokens without accounting for their downstream effects. We show that this myopia can increase cumulative uncertainty and lead to suboptimal generation. To address this, we propose the **Info-Gain Sampler**, a training-free decoding method that uses the bidirectional structure of MDMs to balance immediate uncertainty with the information gained over remaining masked positions. Across reasoning, coding, creative writing, and image generation tasks, Info-Gain Sampler consistently outperforms existing MDM samplers, improving average reasoning accuracy by 2.9--11.6 percentage points and achieving a 62.8% average win rate in creative writing. The code is available at https://github.com/yks23/Information-Gain-Sampler.

Kaisen Yang, Jayden Teoh, Kaicheng Yang, Yitong Zhang, Alex Lamb• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy83.3	1398
Mathematical Reasoning	GSM8K (test)	Accuracy88.9	954
Text-to-Image Generation	GenEval	--	218
Visual Question Answering	InfoVQA	Accuracy33.37	195
Code Generation	MBPP	Accuracy48.4	165
Planning	Sudoku	Accuracy84.4	129
Information Visual Question Answering	InfoVQA	Accuracy33.26	110
Multi-modal Reasoning	M3CoT	Accuracy39.23	90
Planning	Countdown	Accuracy45.2	89
Multi-modal Question Answering	MMBench	Accuracy67.73	84

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord