Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Modeling on RewardBench v1.0 (test)

0.9777Chat Score

BT + margin

0.5474520.6591510.770850.882549Mar 6, 2025Apr 20, 2025Jun 5, 2025Jul 21, 2025Sep 5, 2025Oct 21, 2025Dec 6, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
0.97770.47590.81280.69970.7415
2025.12
0.96930.49780.84190.76930.7696
2025.12
0.96650.45830.84050.72810.7484
2025.03
0.9650.4550.7540.8620.759
2025.12
0.96090.38380.77570.78090.7253
2025.12
0.95810.3750.78650.72980.7123
2025.12
0.95810.3980.77970.80710.7357
2025.12
0.95530.49890.81690.7170.7524
2025.12
0.95530.49780.79730.72520.7439
2025.12
0.95250.40350.77970.75410.7225
2025.12
0.93850.37720.75950.72280.6995
2025.03
0.9270.5420.7590.7160.736
2025.03
0.9220.4340.7990.8370.748
2025.03
0.8860.5640.7730.6450.717
2025.03
0.8830.5810.8070.8280.775
2025.12
0.86030.78290.89860.67050.8031
2025.03
0.8550.4910.7710.7650.72
2025.12
0.8380.73460.8250.80710.8012
2025.12
0.8380.78730.88780.7460.8148
2025.03
0.8240.4690.70.5680.64
2025.12
0.81280.73360.82430.77460.7863
2025.03
0.7770.5130.5690.4820.585
2025.03
0.6790.4230.6730.3780.538
2025.03
0.6290.3050.120.3680.355
2025.03
0.5950.3580.6510.3120.479
2025.03
0.5920.290.70.3550.484
2025.03
0.5640.2920.3850.1940.359