Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Preference Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Preference Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
AlpacaEval 2
SelectiveDPO
WR (%)
559
48
2d ago
VCD
SFT
Multi-choice One Preference p+
94.15
39
1mo ago
Mturk user study Overall Preferences 1.0
Policy R1
Overall Preference Fraction
76
30
3mo ago
ImageReward
Q-Real-Score
Accuracy
60.7
20
2mo ago
RMB Best-of-N
Skywork-Reward-V2-Llama-3.1-8B-40M
Helpfulness Score (BoN)
86.2
16
3mo ago
SpeechJudge
UrgentMOS
Acc@0.5
75
15
3mo ago
SpeechEval
UrgentMOS
Acc@0.5
83
15
3mo ago
URGENT SQA 24
UrgentMOS
Acc@0.5
59
15
3mo ago
TMHINT-QI
UrgentMOS
Acc@0.5
63
15
3mo ago
SOMOS
UrgentMOS
Acc@0.5
60
15
3mo ago
URGENT25-SQA
UrgentMOS
Acc@0.5
59
15
3mo ago
CHiME UDASE 7 (test)
UrgentMOS
Acc@0.5
66
15
3mo ago
NISQA-FOR
UrgentMOS
Acc@0.5
81
15
3mo ago
NISQA-P501
SCOREQ
Acc@0.5
81
15
3mo ago
Detail
Pluralistic
Avg Score
8.25
14
3mo ago
OCR-like
Ours (Homo.)
Average Score
9.06
14
3mo ago
Medical
Ours (Homo.)
Avg Score
8.58
14
3mo ago
UltraFeedback core250 (test)
TEA
Win Rate
80
12
21d ago
PPE Preference (test)
Probe
Kuiper Statistic
0.0434
8
3mo ago
HH-Helpful
DLMA-13B
Win Count
52
8
3mo ago
HH-Harmless
DLMA-13B
Win Rate
60
8
3mo ago
PKU-SafeRLHF
DLMA-13B
Win Rate
57
8
3mo ago
Anthropic-SafeRLHF (target)
πbias (rubric-based preference attack)
Win Rate
41.7
2
3mo ago
Anthropic-SafeRLHF benchmark
πbias (rubric-based preference attack)
Win Rate
33.7
2
3mo ago
Ultra-Real (target)
πbias (rubric-based preference attack)
Win Rate
43
2
3mo ago
Showing 25 of 27 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs