Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM win-rate estimation ranking on LLM benchmark (Appendix)

1Spearman Correlation

AIPW-EM

-0.040.230.50.77Apr 23, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
11000
2026.04
11000
2026.04
11000
2026.04
11000
2026.04
11000
2026.04
11000
2026.04
0.7587.50.003
2026.04
0.5750.006
2026.04
0.5750.006
2026.04
0.5750.006
2026.04
0.2562.50.01
2026.04
0.2562.50.01
2026.04
0.2562.50.01
2026.04
0500.013