Test-time Alignment of Diffusion Models without Reward Over-optimization

About

Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS.

Sunwoo Kim, Minkyu Kim, Dongmin Park• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval 1.0 (test)	Overall Score34	130
Personalized Image Generation	Personalized Image Generation dataset	CLIP-I0.8758	21
Offline Reinforcement Learning	D4RL v2 (various)	Average Score80.2	17
Text-to-Image Generation	ImageReward (test)	ImageReward Score1.052	16
Text-to-Image Synthesis	40 animal prompts Stable Diffusion v1.5 (test)	Aesthetic Score7.22	9
Personalized Image Generation	Human Evaluation 30 volunteers (test)	Win Rate53	8
Text-to-Image Generation	Stable Diffusion prompts from DAS v1.x (test)	Mean Pairwise Distance0.36	6
Text-to-Image Generation	SDXL	BRISQUE19.55	6
Class-conditional posterior sampling	MNIST	FID0.026	6
Class-conditional posterior sampling	CIFAR-10	FID0.213	6

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord