GIFT: Guided Importance-Aware Fine-Tuning for Diffusion Language Models

About

Diffusion models have recently shown strong potential in language modeling, offering faster generation compared to traditional autoregressive approaches. However, applying supervised fine-tuning (SFT) to diffusion models remains challenging, as they lack precise probability estimates at each denoising step. While the diffusion mechanism enables the model to reason over entire sequences, it also makes the generation process less predictable and often inconsistent. This highlights the importance of controlling key tokens that guide the direction of generation. To address this issue, we propose GIFT, an importance-aware finetuning method for diffusion language models, where tokens are assigned different importance weights based on their entropy. Derived from diffusion theory, GIFT delivers substantial gains: across diverse settings including different mainstream training datasets ranging from 1k to 10k in size, utilizing LoRA or full parameter fine-tuning, and training on base or instruct models, GIFT consistently achieves superior overall performance compared to standard SFT on four widely used reasoning benchmarks (Sudoku, Countdown, GSM8K, and MATH-500).

Guowei Xu, Wenxin Xu, Jiawang Zhao, Kaisheng Ma• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Countdown	Accuracy14.1	252
Grade School Math Reasoning	GSM8K	Accuracy (GSM8K)78.2	186
Code Generation	MBPP	Accuracy44.4	165
Math Word Problem Solving	GSM8K	Accuracy77.7	158
Logical reasoning	Sudoku	Accuracy16	152
Math	MATH 500	Accuracy33.7	126
Mathematical Problem Solving	MATH500	Accuracy33	96
Grade School Math Word Problems	GSM8K	Accuracy0.552	76
Reasoning	Countdown	Accuracy21.7	49
Reasoning	Sudoku	Accuracy (Sudoku Reasoning)17.6	25

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord