Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

About

Modern preference alignment methods, such as DPO, rely on divergence regularization to a reference model for training stability-but this creates a fundamental problem we call "reference mismatch." In this paper, we investigate the negative impacts of reference mismatch in aligning text-to-image (T2I) diffusion models, showing that larger reference mismatch hinders effective adaptation given the same amount of data, e.g., as when learning new artistic styles, or personalizing to specific objects. We demonstrate this phenomenon across text-to-image (T2I) diffusion models and introduce margin-aware preference optimization (MaPO), a reference-agnostic approach that breaks free from this constraint. By directly optimizing the likelihood margin between preferred and dispreferred outputs under the Bradley-Terry model without anchoring to a reference, MaPO transforms diverse T2I tasks into unified pairwise preference optimization. We validate MaPO's versatility across five challenging domains: (1) safe generation, (2) style adaptation, (3) cultural representation, (4) personalization, and (5) general preference alignment. Our results reveal that MaPO's advantage grows dramatically with reference mismatch severity, outperforming both DPO and specialized methods like DreamBooth while reducing training time by 15%. MaPO thus emerges as a versatile and memory-efficient method for generic T2I adaptation tasks.

Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score52.8
704
Text-to-Image GenerationT2I-CompBench++
Color0.609
95
Text-to-Image GenerationPartiPrompts
ImageReward0.9324
92
Text-to-Image GenerationPick-a-Pic v2 (test)
PickScore55.9
92
Compositional Image GenerationGenEval
Overall Score52.8
84
Text-to-Image GenerationHPS v2
HPSv2.1 Score0.2934
71
Text-to-Image GenerationHPSv2 (test)
Aesthetic Score6.71
50
Text-to-Image GenerationHPD
PickScore22.638
38
Text-to-Image GenerationHPD v2 (test)
ImageReward92.5
29
Text-to-Image GenerationParti-Prompts (test)
Aesthetic Score72.4
29
Showing 10 of 28 rows

Other info

Follow for update