Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Model Evaluation on Reward Bench 2 (test)

0.451RB2 Factuality MAE

Distribution-Calibrated Aggregation

0.44060.51080.5810.6512Dec 2, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
0.4510.2870.2850.4140.2850.081
2025.12
0.4870.3320.3060.4510.3190.094
2025.12
0.5730.4230.4150.530.4050.221
2025.12
0.5750.4240.410.5740.4070.226
2025.12
0.5910.4410.450.5240.4140.197
2025.12
0.5990.4390.4270.5970.4060.208
2025.12
0.6150.3940.360.4980.3730.155
2025.12
0.6470.4030.3840.5520.4020.158
2025.12
0.670.4150.3850.570.4050.165
2025.12
0.6750.4150.3910.5510.4120.178
2025.12
0.6810.3970.40.5810.4060.177
2025.12
0.7110.370.3720.6030.4090.177