Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Human Preference Alignment benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Human Preference Alignment
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
PKU-SafeRLHF
EXO-BT
BLEU
0.324
31
26d ago
MM-AlignBench 1.0 (test)
Claude3.5V-Sonnet
Win Rate
84.9
18
1mo ago
Out-of-Domain (test)
Annotators
Agreement
81.8
15
23d ago
HH (test)
TRE-P
Reward
3.8764
14
1mo ago
REACT-Video
REACT
Acc (Tie, Overall)
61
12
1mo ago
HPD v2
BranchGRPO
HPS-v2.1
0.379
10
1mo ago
HPD v2 (test)
DanceGRPO
HPSv2.1
0.371
7
1mo ago
Human Preference Alignment Out-of-Domain (test)
TAFS-GRPO
HPS-v2.1
35.3
7
1mo ago
Human Preference Alignment In-Domain (test)
TAFS-GRPO
Pick Score
22.46
7
1mo ago
Multi-Challenge
Qwen3-30A3-2507
Avg@3
49.4
6
1mo ago
ArenaHard V2
Qwen3-30A3-2507
Avg@3 Score
60
6
1mo ago
Human Preference Alignment
OP-GRPO
PickScore
23.64
5
12d ago
HPDv2
TreeGRPO
HPS-v2.1
0.3735
5
1mo ago
PickScore
VGPO
PickScore (Task)
23.55
5
1mo ago
VideoGen-RewardBench (test)
VideoReward
VQ Acc (w/ Tie)
66
5
1mo ago
User study dataset HH-RLHF and PKU-SafeRLHF prompts (test)
DPO-HPS
Quality Score
3.93
4
26d ago
PickScore
SuperFlow
PickScore
86.851
4
1mo ago
DrawBench Task-specific (test)
DenseGRPO
PickScore (Task Metric)
24.64
4
1mo ago
PKU-Safety
DPO-HPS
Win Rate
67.1
3
26d ago
Showing 19 of 19 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs