Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-a-Judge on RewardBench 1.0 (test)

0.54Rstd

CC

-0.15564.53979.23513.9303Oct 20, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.10
0.5489.11
2024.10
0.9488.06
2024.10
1.4289.54
2024.10
1.6889.38
2024.10
1.8487.84
2024.10
1.9589.34
2024.10
2.7264.25
2024.10
3.0262.47
2024.10
3.6586.32
2024.10
3.8287.25
2024.10
3.9188.64
2024.10
4.0187.2
2024.10
4.1864.09
2024.10
5.5167.13
2024.10
6.4868.12
2024.10
6.5783.87
2024.10
6.8867.11
2024.10
7.0160.21
2024.10
7.2265.24
2024.10
7.5166.54
2024.10
7.7285.74
2024.10
7.9364.89
2024.10
8.4365.39
2024.10
8.5466.36
2024.10
9.7661.88
2024.10
11.466.89
2024.10
11.6363.14
2024.10
12.0264.25
2024.10
13.7966.31
2024.10
13.9465.9
2024.10
14.5466.9
2024.10
14.6264.52
2024.10
15.0165.79
2024.10
15.6166.35
2024.10
16.7965.27
2024.10
17.9364.96