Fine-Tuning Diffusion Models via Intermediate Distribution Shaping

About

Diffusion models are widely used for generative tasks across domains. Given a pre-trained diffusion model, it is often desirable to fine-tune it further either to correct for errors in learning or to align with downstream applications. Towards this, we examine the effect of shaping the distribution at intermediate noise levels induced by diffusion models. First, we show that existing variants of Rejection sAmpling based Fine-Tuning (RAFT), which we unify as GRAFT, can implicitly perform KL regularized reward maximization with reshaped rewards. Motivated by this observation, we introduce P-GRAFT to shape distributions at intermediate noise levels and demonstrate empirically that this can lead to more effective fine-tuning. We mathematically explain this via a bias-variance tradeoff. Next, we look at correcting learning errors in pre-trained flow models based on the developed mathematical framework. In particular, we propose inverse noise correction, a novel algorithm to improve the quality of pre-trained flow models without explicit rewards. We empirically evaluate our methods on text-to-image(T2I) generation, layout generation, molecule generation and unconditional image generation. Notably, our framework, applied to Stable Diffusion v2, improves over policy gradient methods on popular T2I benchmarks in terms of VQAScore and shows an $8.81\%$ relative improvement over the base model. For unconditional image generation, inverse noise correction improves FID of generated images at lower FLOPs/image.

Gautham Govind Anil, Shaan Ul Haque, Nithish Kannen, Dheeraj Nagaraj, Sanjay Shakkottai, Karthikeyan Shanmugam• 2025

Related benchmarks

Task	Dataset	Result
Image Generation	CelebA-HQ 256x256	FID8.02	55
Text-to-Image Generation	GenAI-Bench	--	47
Text-to-Image Generation	T2ICompBench++ (val)	VQAScore76.15	17
Text-to-Image Generation	GenEval	VQAScore80.96	14
Image Generation	LSUN Church 256x256	FID7.26	10
Molecule Generation	QM9	Mol Stability0.9261	9
Class-conditional Layout Generation	PubLayNet (test)	Alignment7.2	4
Unconditional Layout Generation	PubLayNet (test)	Alignment0.071	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord