Test-time Alignment of Diffusion Models without Reward Over-optimization
About
Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | GenEval 1.0 (test) | Overall Score34 | 63 | |
| Personalized Image Generation | Personalized Image Generation dataset | CLIP-I0.8758 | 21 | |
| Offline Reinforcement Learning | D4RL v2 (various) | Average Score80.2 | 17 | |
| Text-to-Image Generation | ImageReward (test) | ImageReward Score1.052 | 16 | |
| Text-to-Image Synthesis | 40 animal prompts Stable Diffusion v1.5 (test) | Aesthetic Score7.22 | 9 | |
| Personalized Image Generation | Human Evaluation 30 volunteers (test) | Win Rate53 | 8 |