Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Preference Evaluation benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Preference Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Mturk user study Overall Preferences 1.0
Policy R1
Overall Preference Fraction
76
30
3d ago
SpeechJudge
UrgentMOS
Acc@0.5
75
15
3d ago
SpeechEval
UrgentMOS
Acc@0.5
83
15
3d ago
URGENT SQA 24
UrgentMOS
Acc@0.5
59
15
3d ago
TMHINT-QI
UrgentMOS
Acc@0.5
63
15
3d ago
SOMOS
UrgentMOS
Acc@0.5
60
15
3d ago
URGENT25-SQA
UrgentMOS
Acc@0.5
59
15
3d ago
CHiME UDASE 7 (test)
UrgentMOS
Acc@0.5
66
15
3d ago
NISQA-FOR
UrgentMOS
Acc@0.5
81
15
3d ago
NISQA-P501
SCOREQ
Acc@0.5
81
15
3d ago
ImageReward
BLPO
F1 Score
34
15
3d ago
Detail
Pluralistic
Avg Score
8.25
14
3d ago
OCR-like
Ours (Homo.)
Average Score
9.06
14
3d ago
Medical
Ours (Homo.)
Avg Score
8.58
14
3d ago
AlpacaEval 2
SelectiveDPO
WR (%)
559
12
3d ago
PPE Preference (test)
Probe
Kuiper Statistic
0.0434
8
3d ago
HH-Helpful
DLMA-13B
Win Count
52
8
3d ago
HH-Harmless
DLMA-13B
Win Rate
60
8
3d ago
PKU-SafeRLHF
DLMA-13B
Win Rate
57
8
3d ago
Anthropic-SafeRLHF (target)
πbias (rubric-based preference attack)
Win Rate
41.7
2
3d ago
Anthropic-SafeRLHF benchmark
πbias (rubric-based preference attack)
Win Rate
33.7
2
3d ago
Ultra-Real (target)
πbias (rubric-based preference attack)
Win Rate
43
2
3d ago
Ultra-Real benchmark
πbias (rubric-based preference attack)
Win Rate
43.1
2
3d ago
Showing 23 of 23 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Terms of Service
FAQs