Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Preference Prediction benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Preference Prediction
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
PRISM (test)
EXACT
Accuracy
66.62
51
3mo ago
Arena-Expert-5K, HelpSteer3, HH-RLHF, and UltraFeedback (held-out)
PREMISE (+ pref repair)
Accuracy
70.5
42
2d ago
PPE Preference (test)
Skywork-Reward-V2-Llama-3.1-8B-40M
Preference Score
79.8
24
3mo ago
MMRB2 out-of-domain
AutoRubric-T2I (on PickScore)
EvalMuse Score
70.7
22
15d ago
MovieLens 10k
logit
Accuracy (Q=0.25)
61
20
6d ago
PickScore (test)
DRM
Accuracy
73.4
19
7d ago
RewardBench
C2
Accuracy
91.8
19
21d ago
HPD v3 (test)
HPSv3 - 7B
Accuracy
76.9
14
7d ago
Sushi Preference Category B
logit
Accuracy (0.25 Quantile)
67
10
6d ago
Netflix Prize 150k ratings
logit
Accuracy (0.25 Quantile)
61
10
6d ago
Netflix Prize 100k ratings
logit
Accuracy (0.25 Quantile)
62
10
6d ago
MovieLens 50k ratings
logit
Accuracy (0.25 Quantile)
59
10
6d ago
MovieLens 1k ratings
logit
Accuracy (0.25 Quantile)
62
10
6d ago
JudgeBench
Reasoning RM + External-Rubric (32B)
Positional Consistent Accuracy
63.9
10
1mo ago
RewardBench 2
Reasoning RM + External-Rubric (32B)
Accuracy
73.9
10
1mo ago
RM-Bench
C2
Accuracy
87.8
10
1mo ago
Pets
SPL
Accuracy
100
8
2mo ago
UltraFeedback 500 held-out users (test)
RFM(32)
Test Accuracy
70.53
7
3mo ago
Sushi Preference Category A
direct
Accuracy (Q0.25)
68
5
6d ago
Eigen-taste Jokes onehot variant
logit
Accuracy (0.25 Q)
61
5
6d ago
HPSv3 (test)
HPSv3
Accuracy
74
5
15d ago
Meta-World Pick-Place (Novel Task)
ReCouPLe-EC
Reward Accuracy
66.3
4
2mo ago
Meta-World Pick-Place-Wall (train)
ReCouPLe-IC
Reward Accuracy
65.7
4
2mo ago
Meta-World Push-Wall (train)
RFP
Reward Accuracy
90
4
2mo ago
Meta-World Push (train)
ReCouPLe-IC
Reward Accuracy
89.3
4
2mo ago
Showing 25 of 31 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs