Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Judge Performance on PERSUADE

0.955CCC

GPT4o

0.09180.31590.540.7641Mar 5, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
0.9550.960.9440.226
2026.03
0.9440.9530.9320.29
2026.03
0.9420.9480.8370.4
2026.03
0.9320.9370.9040.4
2026.03
0.9270.9350.9160.344
2026.03
0.9060.9220.9010.4
2026.03
0.8980.9360.8870.5
2026.03
0.8810.9650.9420.5
2026.03
0.8770.8880.8340.6
2026.03
0.8680.9280.9010.5
2026.03
0.8530.8860.850.4
2026.03
0.8460.8990.9140.6
2026.03
0.8380.9010.8890.484
2026.03
0.8230.9050.8910.6
2026.03
0.8190.9150.9140.6
2026.03
0.7440.9630.9730.6
2026.03
0.7410.8560.8290.7
2026.03
0.7390.9430.9580.9
2026.03
0.7050.8360.8310.8
2026.03
0.6820.850.8630.933
2026.03
0.6820.8340.8641.1
2026.03
0.6490.7710.8641.1
2026.03
0.6340.7090.7331
2026.03
0.6270.8340.9111.3
2026.03
0.60.9010.8461.3
2026.03
0.5640.7520.7611.1
2026.03
0.5460.7020.7151.2
2026.03
0.4650.60.6591.3
2026.03
0.4460.5380.5160.8
2026.03
0.3430.5950.6161.7
2026.03
0.210.4270.4681.2
2026.03
0.1250.3270.4441.2