Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward Modeling EvaluationReward Bench Factuality 2
Pairwise Accuracy56.6
64
Reward ModelingReward Bench Math
EF0.305
52
Reward ModelingReward Bench safety subset response perturbations 2
LE Score-0.629
26
Reward ModelingReward Bench safety subset prompt perturbations 2
EF-0.18
26
Reward ModelingReward Bench V2
Accuracy83.44
22
Reward ModelingReward Bench Prior Sets
Prior Sets Score78.2
17
Reward Model EvaluationReward Bench 2 (test)
RB2 Factuality MAE0.451
12
Reward Modeling EvaluationReward Bench Ties 2
Pairwise Accuracy91.8
12
Reward Modeling EvaluationReward Bench Safety 2
Pairwise Accuracy72.3
12
Reward Modeling EvaluationReward Bench Math 2
Pairwise Accuracy72.3
12
Reward Modeling EvaluationReward-Bench
Agreement84.79
12
Showing 11 of 11 rows