Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Preference Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Preference Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
VCD
SFT
Multi-choice One Preference p+
94.15
39
4d ago
Mturk user study Overall Preferences 1.0
Policy R1
Overall Preference Fraction
76
30
1mo ago
ImageReward
Q-Real-Score
Accuracy
60.7
20
17d ago
RMB Best-of-N
Skywork-Reward-V2-Llama-3.1-8B-40M
Helpfulness Score (BoN)
86.2
16
1mo ago
SpeechJudge
UrgentMOS
Acc@0.5
75
15
1mo ago
SpeechEval
UrgentMOS
Acc@0.5
83
15
1mo ago
URGENT SQA 24
UrgentMOS
Acc@0.5
59
15
1mo ago
TMHINT-QI
UrgentMOS
Acc@0.5
63
15
1mo ago
SOMOS
UrgentMOS
Acc@0.5
60
15
1mo ago
URGENT25-SQA
UrgentMOS
Acc@0.5
59
15
1mo ago
CHiME UDASE 7 (test)
UrgentMOS
Acc@0.5
66
15
1mo ago
NISQA-FOR
UrgentMOS
Acc@0.5
81
15
1mo ago
NISQA-P501
SCOREQ
Acc@0.5
81
15
1mo ago
Detail
Pluralistic
Avg Score
8.25
14
1mo ago
OCR-like
Ours (Homo.)
Average Score
9.06
14
1mo ago
Medical
Ours (Homo.)
Avg Score
8.58
14
1mo ago
AlpacaEval 2
SelectiveDPO
WR (%)
559
14
3d ago
PPE Preference (test)
Probe
Kuiper Statistic
0.0434
8
1mo ago
HH-Helpful
DLMA-13B
Win Count
52
8
1mo ago
HH-Harmless
DLMA-13B
Win Rate
60
8
1mo ago
PKU-SafeRLHF
DLMA-13B
Win Rate
57
8
1mo ago
Anthropic-SafeRLHF (target)
πbias (rubric-based preference attack)
Win Rate
41.7
2
1mo ago
Anthropic-SafeRLHF benchmark
πbias (rubric-based preference attack)
Win Rate
33.7
2
1mo ago
Ultra-Real (target)
πbias (rubric-based preference attack)
Win Rate
43
2
1mo ago
Ultra-Real benchmark
πbias (rubric-based preference attack)
Win Rate
43.1
2
1mo ago
Showing 25 of 25 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs