Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling

About

Text-to-image diffusion model alignment is critical for improving the alignment between the generated images and human preferences. While training-based methods are constrained by high computational costs and dataset requirements, training-free alignment methods remain underexplored and are often limited by inaccurate guidance. We propose a plug-and-play training-free alignment method, DyMO, for aligning the generated images and human preferences during inference. Apart from text-aware human preference scores, we introduce a semantic alignment objective for enhancing the semantic alignment in the early stages of diffusion, relying on the fact that the attention maps are effective reflections of the semantics in noisy images. We propose dynamic scheduling of multiple objectives and intermediate recurrent steps to reflect the requirements at different steps. Experiments with diverse pre-trained diffusion models and metrics demonstrate the effectiveness and robustness of the proposed method.

Xin Xie, Dong Gong• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval 1.0 (test)
Overall Score50
63
Text-to-Image GenerationPick-a-Pic (val)
PickScore24.9
20
Text-to-Image GenerationPick-a-Pic 1K prompts v1
ImageReward1.062
20
Text-to-Image GenerationPick-a-Pic, HPSv2, and PartiPrompts (test)
PickScore24.9
12
Text-to-Image GenerationPick-a-Pic (500), HPSv2 (500), and PartiPrompts (1000) (test)
PickScore23.07
10
Text-to-Image SynthesisGenEval SD V1.5
Overall Score57
9
Showing 6 of 6 rows

Other info

Code

Follow for update