Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fine-Tuning Diffusion Models via Intermediate Distribution Shaping

About

Diffusion models are widely used for generative tasks across domains. Given a pre-trained diffusion model, it is often desirable to fine-tune it further either to correct for errors in learning or to align with downstream applications. Towards this, we examine the effect of shaping the distribution at intermediate noise levels induced by diffusion models. First, we show that existing variants of Rejection sAmpling based Fine-Tuning (RAFT), which we unify as GRAFT, can implicitly perform KL regularized reward maximization with reshaped rewards. Motivated by this observation, we introduce P-GRAFT to shape distributions at intermediate noise levels and demonstrate empirically that this can lead to more effective fine-tuning. We mathematically explain this via a bias-variance tradeoff. Next, we look at correcting learning errors in pre-trained flow models based on the developed mathematical framework. In particular, we propose inverse noise correction, a novel algorithm to improve the quality of pre-trained flow models without explicit rewards. We empirically evaluate our methods on text-to-image(T2I) generation, layout generation, molecule generation and unconditional image generation. Notably, our framework, applied to Stable Diffusion v2, improves over policy gradient methods on popular T2I benchmarks in terms of VQAScore and shows an $8.81\%$ relative improvement over the base model. For unconditional image generation, inverse noise correction improves FID of generated images at lower FLOPs/image.

Gautham Govind Anil, Shaan Ul Haque, Nithish Kannen, Dheeraj Nagaraj, Sanjay Shakkottai, Karthikeyan Shanmugam• 2025

Related benchmarks

TaskDatasetResultRank
Image GenerationCelebA-HQ 256x256
FID8.02
55
Text-to-Image GenerationGenAI-Bench--
41
Text-to-Image GenerationT2ICompBench++ (val)
VQAScore76.15
17
Text-to-Image GenerationGenEval
VQAScore80.96
14
Image GenerationLSUN Church 256x256
FID7.26
10
Molecule GenerationQM9
Mol Stability0.9261
9
Class-conditional Layout GenerationPubLayNet (test)
Alignment7.2
4
Unconditional Layout GenerationPubLayNet (test)
Alignment0.071
4
Showing 8 of 8 rows

Other info

Follow for update