Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RMB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward ModelingRMB
Accuracy89.3
120
Reward ModelingRMB (test)
Score89.3
22
Preference EvaluationRMB Best-of-N
Helpfulness Score (BoN)86.2
16
Reward ModelingRMB
Help Accuracy88.6
13
Showing 4 of 4 rows