Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RMB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward ModelingRMB
Accuracy89.3
120
Reward ModelingRMB (test)
Score89.3
22
Preference EvaluationRMB Best-of-N
Helpfulness Score (BoN)86.2
16
Reward ModelingRMB
Help Accuracy88.6
13
Best-of-N evaluationRMB
Accuracy59.69
2
Showing 5 of 5 rows