SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models

About

Preference learning has garnered extensive attention as an effective technique for aligning diffusion models with human preferences in visual generation. However, existing alignment approaches such as Diffusion-DPO suffer from two fundamental challenges: training instability caused by high gradient variances at various timesteps and high parameter sensitivities, and off-policy bias arising from the discrepancy between the optimization data and the policy models' distribution. Our first contribution is a systematic analysis of diffusion trajectories across different timesteps, identifying that the instability primarily originates from early timesteps with low importance weights. To address these issues, we propose \textbf{SIPO}, a \textbf{S}tabilized and \textbf{I}mproved \textbf{P}reference \textbf{O}ptimization framework for aligning diffusion models with human preferences. Concretely, a key gradient, \emph{i.e.,} DPO-C\&M is introduced to stabilize training by clipping and masking uninformative timesteps. This is followed by a timestep-aware importance-reweighting paradigm to mitigate off-policy bias and emphasize informative updates throughout the alignment process. Extensive experiments on various baseline models including image generation models on SD1.5, SDXL, and video generation models CogVideoX-2B/5B, Wan2.1-1.3B, demonstrate that our SIPO consistently promotes stabilized training and outperforms existing alignment methods that with meticulous adjustments on parameters.Overall, these results suggest the importance of timestep-aware alignment and provide valuable guidelines for improved preference optimization in aligning diffusion models.

Xiaomeng Yang, Mengping Yang, Junyan Wang, Zhijian Zhou, Zhiyu Tan, Hao Li• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score0.85	318
Text-to-Image Generation	Pick-a-Pic (test)	PickScore22.013	43
Text-to-Video Generation	VBench	Total Score84.78	16

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord