Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning

About

Diffusion models generate samples through an iterative denoising process guided by a pretrained neural network. Once the denoiser is fixed, the sampling algorithm itself (noise schedules, guidance scales, stochasticity profiles) still requires careful tuning, a process typically carried out through costly empirical grid search. In this work, we introduce an inverse reinforcement learning framework for learning sampling strategies without retraining the denoiser. We formulate the diffusion sampling procedure as a discrete-time finite-horizon Markov Decision Process, where actions correspond to optional modifications of the sampling dynamics. To optimize action scheduling, we avoid defining an explicit reward function and instead directly match the target behavior expected from the sampler using policy gradient techniques. We provide experimental evidence that this approach matches fine-tuned samplers and comes at a modest cost compared to grid search: on ImageNet-64, a single training run replaces exhaustive search at up to 9x lower cost, with only 16% overhead at inference.

Constant Bourdrez, Alexandre V\'erine, Olivier Capp\'e• 2026

Related benchmarks

Task	Dataset	Result
Image Generation	CIFAR10 32x32 (test)	FID3.18	186
Image Generation	ImageNet 64x64 resolution (test)	FID2.92	150
Image Generation	FFHQ 64x64 (test)	FID3.04	104

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord