Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Test-time Alignment of Diffusion Models without Reward Over-optimization

About

Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS.

Sunwoo Kim, Minkyu Kim, Dongmin Park• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval 1.0 (test)
Overall Score34
63
Personalized Image GenerationPersonalized Image Generation dataset
CLIP-I0.8758
21
Offline Reinforcement LearningD4RL v2 (various)
Average Score80.2
17
Text-to-Image GenerationImageReward (test)
ImageReward Score1.052
16
Text-to-Image Synthesis40 animal prompts Stable Diffusion v1.5 (test)
Aesthetic Score7.22
9
Personalized Image GenerationHuman Evaluation 30 volunteers (test)
Win Rate53
8
Showing 6 of 6 rows

Other info

Follow for update