Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward Model Suitability Audit on RM Bench Chat

0.313EF

GRM-llama3-8B-distill

-0.16332-0.039660.0840.20766Nov 30, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
0.3130.0190.3490.4220.247-0.0360.0320.3370.5810.19
2025.11
0.3020.0060.3440.4060.387-0.0620.10.3130.4650.195
2025.11
0.160.2420.1620.1140.350.0360.0460.1510.2290.258
2025.11
0.148-0.030.3150.3820.254-0.010.0090.3040.5490.176
2025.11
0.131-0.0010.0340.0460.1150.087-0.0370.0430.102-0.009
2025.11
0.12-0.032-0.0390.1860.05-0.197-0.1070.3470.4350.079
2025.11
0.115-0.1440.3630.360.1550.1620.2470.3360.5250.34
2025.11
0.091-0.1660.1740.3660.1620.1210.2550.2670.303-0.118
2025.11
0.0760.1480.230.2630.391-0.0870.1620.3590.6390.204
2025.11
0.0680.057-0.0180.030.037-0.1070.0390.370.490.114
2025.11
0.0670.1430.1880.2550.271-0.382-0.4890.4490.5490.171
2025.11
0.062-0.009-0.062-0.070.0420.3950.3890.2070.3930.399
2025.11
0.0520.0850.1320.1090.1740.190.0770.0260.1820.192
2025.11
0.047-0.067-0.163-0.1170.0670.7740.9060.2660.4580.696
2025.11
0.029-0.0030.2650.250.26-0.0620.0010.1990.5150.476
2025.11
0.0040.0290.0180.239-0.0230.0920.0810.2620.2410.037
2025.11
-0.0060.0660.5020.4660.202-0.0690.0130.3280.550.128
2025.11
-0.0330.041-0.102-0.0070.0430.0020.312-0.2210.229-0.008
2025.11
-0.0520.2140.2650.2730.3060.0580.2340.3810.4570.254
2025.11
-0.0590.0020.0490.01-0.063-0.127-0.0390.2370.4460.025
2025.11
-0.0650.2470.5090.530.385-0.202-0.1120.3760.7410.047
2025.11
-0.071-0.1270.061-0.165-0.055-0.216-0.1170.350.5-0.004
2025.11
-0.0810.0340.1510.019-0.3020.2060.1660.1090.141-0.21
2025.11
-0.10.1470.170.2050.228-0.0440.2550.4350.4360.149
2025.11
-0.12-0.135-0.002-0.088-0.040.0940.0930.008-0.0740.019
2025.11
-0.1450.043-0.105-0.1130.0370.0890.081-0.045-0.0810.006