Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Human Preference Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Human Preference Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
ImageReward
SparVAR
Average Score
1.0533
24
1mo ago
HPS v2.1
+FastVAR
Photo Score
29.87
24
1mo ago
HPD v2 (test)
HPSv3
Preference Accuracy
85.36
18
1mo ago
ImageReward (test)
MPS
Preference Accuracy
0.675
18
1mo ago
Human Preference for Oral Argument Simulation (Evaluation set)
Gemini-2.5-Pro
Wins
72
9
1mo ago
Basque Arena
GPT-4o
Arena Content Score
1,183
7
1mo ago
MHP Overall (test)
MPS
Preference Accuracy
74.2
7
1mo ago
364 scientific paper pairs (original vs. AI-revised)
APRES
Preference Percentage
47.8
4
1mo ago
VideoPhy 1.0 (test)
MAGI-1
Physics Plausibility Win Rate
59.3
4
1mo ago
PhysicsIQ 1.0 (test)
MAGI-1
Physics Plausibility Win Rate
54.9
4
1mo ago
Cooking
Stitch-a-Demo
Step Faithfulness Win Rate
94
3
12d ago
Arena Creative Writing
JS
Win Rate
23.4
3
1mo ago
Arena-Hard v0.1
JS
Win Rate
56.7
3
1mo ago
User Study (Group 2)
UAV-GPT
Category CT Score
24
3
1mo ago
EgoDex and DreamDojo-HV novel out-of-distribution (eval)
DreamDojo-14B
Physics Correctness
73.5
3
1mo ago
Human Preference Evaluation 371 prompts (test)
HPS
Recall @1
39.89
3
1mo ago
Human Preference Evaluation 466 prompts (test)
ImageReward
Preference Accuracy
65.14
3
1mo ago
Kuaishou search long-tail query segment (test)
ExpModel
Good Count
48
2
22d ago
PROSOCIALDIALOG
NormGenesis (with V2R)
Preference
82
2
1mo ago
GenBlemish-27K (test)
Agentic Retoucher
Preference Share: Significantly Better
48.8
2
1mo ago
Driver Attention Dataset Study 2
in-lab human driver
Preference Rate
59
2
1mo ago
Driver Attention Dataset Study 1
in-lab human driver
Preference Rate
71
2
1mo ago
RichHF-18K (test)
Finetuned Muse
Preference Rate (≫)
0.215
1
1mo ago
Showing 23 of 23 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs