Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences

About

Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data. Although substantial resources are expended in collecting and labeling datasets, a critical aspect is often neglected: \textit{preferences vary across individuals and should be represented with more granularity.} To address this, we propose SmPO-Diffusion, a novel method for modeling preference distributions to improve the DPO objective, along with a numerical upper bound estimation for the diffusion optimization objective. First, we introduce a smoothed preference distribution to replace the original binary distribution. We employ a reward model to simulate human preferences and apply preference likelihood averaging to improve the DPO loss, such that the loss function approaches zero when preferences are similar. Furthermore, we utilize an inversion technique to simulate the trajectory preference distribution of the diffusion model, enabling more accurate alignment with the optimization objective. Our approach effectively mitigates issues of excessive optimization and objective misalignment present in existing methods through straightforward modifications. Our SmPO-Diffusion achieves state-of-the-art performance in preference evaluation, outperforming baselines across metrics with lower training costs. The project page is https://jaydenlyh.github.io/SmPO-project-page/.

Yunhong Lu, Qichao Wang, Hengyuan Cao, Xiaoyin Xu, Min Zhang• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score57.86	914
Text-to-Image Generation	GenEval	Overall Score57.86	318
Text-to-Image Generation	GenEval (test)	Two Obj. Acc41.92	250
Text-to-Image Generation	Pick-a-Pic	--	150
Text-to-Image Generation	HPD v2 (test)	ImageReward108.1	53
Diffusion Model Fine-tuning	SDXL	GPU Hours120.1	8
Diffusion Model Fine-tuning	SD 1.5	GPU Hours27.6	8
Image Generation	Parti-Prompt (test)	HPSv2.130.03	6
Image Generation	Pick-a-Pic (test)	HPSv2.131.03	6
Text-to-Image Generation	HPD v2	HPS v2.126.41	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord