Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Model Evaluation on RewardBench (Spearman ρ)

0.542Spearman ρ

prob-weighted EV (M2)

0.228960.310230.39150.47277May 15, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.05
0.542
2026.05
0.537
2026.05
0.522
2026.05
0.487
2026.05
0.432
2026.05
0.425
2026.05
0.422
2026.05
0.418
2026.05
0.413
2026.05
0.398
2026.05
0.375
2026.05
0.367
2026.05
0.345
2026.05
0.338
2026.05
0.327
2026.05
0.32
2026.05
0.298
2026.05
0.291
2026.05
0.276
2026.05
0.241