Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Modeling Suitability Evaluation on RM Bench Math

-0.077EF

SOLAR-10.7B-Instruct-v1.0

-0.08624-0.023870.03850.10087Nov 30, 2025
Updated 4d ago

Evaluation Results

MethodLinks
-0.0770.01-0.039-0.065-0.001-0.047-0.032-0.033-0.0020.011
-0.0770-0.043-0.1180.144-0.076-0.131-0.060.1350.016
-0.057-0.0590.001-0.007-0.009-0.031-0.040.0150.0610.014
-0.0380.033-0.044-0.070.106-0.013-0.076-0.0450.036-0.003
-0.034-0.095-0.041-0.0370.108-0.065-0.115-0.0470.014-0.069
-0.031-0.0640.050.0070.014-0.079-0.0650.0370.067-0.013
2025.11
-0.0210.0070.012-0.0230.009-0.0040.0190.128-0.025-0.011
0.0010.013-0.016-0.08-0.057-0.027-0.023-0.063-0.006-0.069
0.0090.089-0.0790.0480.098-0.120.0740.0210.047-0.021
0.01-0.049-0.019-0.0020.063-0.026-0.089-0.0290.027-0.015
0.0180.050.060.0240.109-0.05-0.044-0.0130.0430.028
0.0230.011-0.0290.005-0.0080.0130.033-0.0310.008-0.125
0.0280.023-0.075-0.0520.0030.0140.0260.089-0.083-0.004
0.0350.0510.045-0.010.007-0.057-0.0660.0160.0070.02
0.038-0.005-0.025-0.0030.149-0.003-0.0660.0120.0740.031
0.041-0.027-0.032-0.030.0170.040.018-0.013-0.080.045
0.0470.0160.015-0.0030.056-0.065-0.0460.0050.0190.003
0.0490.050.081-0.010.0430.027-0.094-0.0420.0960.036
0.05-0.0370.031-0.0170.05-0.088-0.0640.0060.044-0.016
0.054-0.017-0.017-0.0340.044-0.033-0.038-0.0370.067-0.055
0.067-0.010.008-0.010.148-0.027-0.0830.0090.0660.001
0.0670.060.0950.0990.105-0.078-0.1110.011-0.034-0.016
0.0710.0550.0810.0970.0470.0760.0410.1220.1240.084
0.0930.0070.0350.0320.22-0.098-0.0970.2280.2680.235
0.123-0.0280.1830.2120.186-0.018-0.160.0390.1760.049
0.1540.0970.2270.2140.191-0.077-0.250.0170.1290.032