Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RM-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward ModelingRM-Bench
Average Score89.4
53
Reward ModelingRM Bench Code
EF0.154
52
Reward ModelingRM-Bench (test)
Overall Score87.1
39
Reward ModelingRM-Bench Chat Hard
Accuracy83.3
34
Reward Modeling Suitability EvaluationRM Bench Math
EF-0.077
26
Reward Modeling Suitability EvaluationRM Bench Safety-accept
EF0.698
26
Reward Model Suitability AuditRM Bench Chat
EF0.313
26
Reward ModelingRM-Bench Chat
Accuracy78.5
18
Reward ModelingRM-Bench Chat subset Normal
Accuracy86
16
Reward ModelingRM-Bench (full)
Chat Score83
11
Reward ModelingRM-Bench Hard
Accuracy0.697
10
Reward ModelingRM-Bench Normal
Accuracy80
10
Reward ModelingRM-Bench Easy
Accuracy92.2
10
Reward ModelingRM-Bench v1.0 (test)
Chat Score71.23
5
Showing 14 of 14 rows