Human Preference Alignment

Benchmarks

Dataset Name	SOTA Method	Metric
PKU-SafeRLHF	EXO-BT	BLEU0.324	31	4mo ago
MT-Bench	RAG-Pref	SAG6.83	20	1mo ago
PickScore	SuperFlow	PickScore86.851	20	2mo ago
MM-AlignBench 1.0 (test)		Win Rate84.9	18	4mo ago
Arena-Hard	KTO	Win Rate (%)63.9	16	1mo ago
HPS v2.1	Infinity-EE-26	Anime Score32.06	16	3mo ago
Out-of-Domain (test)		Agreement81.8	15	4mo ago
HH (test)	TRE-P	Reward3.8764	14	4mo ago
REACT-Video	REACT	Acc (Tie, Overall)61	12	4mo ago
HPD v2	BranchGRPO	HPS-v2.10.379	10	4mo ago
HPD v2 (test)	DanceGRPO	HPSv2.10.371	7	4mo ago
Human Preference Alignment Out-of-Domain (test)	TAFS-GRPO	HPS-v2.135.3	7	4mo ago
Human Preference Alignment In-Domain (test)	TAFS-GRPO	Pick Score22.46	7	4mo ago
DrawBench	Flow-GRPO	HPS-v2.137.7	6	2mo ago
SD3.5 Medium	TMPO	HPS-v2.10.361	6	2mo ago
FLUX.1 (dev)	TMPO	HPS-v2.10.36	6	2mo ago
Multi-Challenge	Qwen3-30A3-2507	Avg@349.4	6	4mo ago
ArenaHard V2	Qwen3-30A3-2507	Avg@3 Score60	6	4mo ago
DrawBench Held-out (test)	RAM	PickScore (Training Reward)23.67	5	2mo ago
Human Preference Alignment	OP-GRPO	PickScore23.64	5	3mo ago
HPDv2	TreeGRPO	HPS-v2.10.3735	5	4mo ago
PickScore	VGPO	PickScore (Task)23.55	5	4mo ago
VideoGen-RewardBench (test)	VideoReward	VQ Acc (w/ Tie)66	5	4mo ago
User study dataset HH-RLHF and PKU-SafeRLHF prompts (test)	DPO-HPS	Quality Score3.93	4	4mo ago
DrawBench Task-specific (test)	DenseGRPO	PickScore (Task Metric)24.64	4	4mo ago

Showing 25 of 27 rows