Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Model Evaluation on Reward Bench 2 (test)

0.451RB2 Factuality MAE

Distribution-Calibrated Aggregation

0.44060.51080.5810.6512Dec 2, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
0.4510.2870.2850.4140.2850.081
2025.12
0.4870.3320.3060.4510.3190.094
2025.12
0.5730.4230.4150.530.4050.221
2025.12
0.5750.4240.410.5740.4070.226
2025.12
0.5910.4410.450.5240.4140.197
2025.12
0.5990.4390.4270.5970.4060.208
2025.12
0.6150.3940.360.4980.3730.155
2025.12
0.6470.4030.3840.5520.4020.158
2025.12
0.670.4150.3850.570.4050.165
2025.12
0.6750.4150.3910.5510.4120.178
2025.12
0.6810.3970.40.5810.4060.177
2025.12
0.7110.370.3720.6030.4090.177