Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

About

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkward combinations of limbs and facial expressions. To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. Thus, we train a human preference classifier with the collected dataset and derive a Human Preference Score (HPS) based on the classifier. Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. Our experiments show that HPS outperforms CLIP in predicting human choices and has good generalization capability toward images generated from other models. By tuning Stable Diffusion with the guidance of HPS, the adapted model is able to generate images that are more preferred by human users. The project page is available here: https://tgxs002.github.io/align_sd_web/ .

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li• 2023

Related benchmarks

Task	Dataset	Result
Human Preference Evaluation	ImageReward (test)	Preference Accuracy0.612	32
Human Preference Evaluation	HPD v2 (test)	Preference Accuracy77.6	32
Preference Evaluation	ImageReward	Accuracy61.2	29
Human preference prediction	HPD v2	Accuracy77.6	25
Preference Prediction	PickScore (test)	Accuracy66.7	19
Semiosis Quality Evaluation	HGI SemiosisArt	KRCC0.03	18
Text-to-Image Preference Prediction	Pick-a-Pic	Accuracy66.7	17
Text-to-Image Preference Prediction	Cross-domain Aggregate	Average Accuracy66.8	17
Pairwise Preference Prediction	DyCoBench-1K Overall Preference	Preference Rate (A > B)65.3	17
Text-to-Image Preference Prediction	ImageReward	Accuracy61.2	17

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord