Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-judge evaluation on Shared 500-prompt sample

0.87Global Correlation (r)

GPT-5.2

0.28760.43880.590.7412Mar 12, 2026
Updated 2mo ago

Evaluation Results

MethodLinks
2026.03
0.870.72069.4
2026.03
0.590.422947.7
2026.03
0.560.471743.6
2026.03
0.470.274223.8
2026.03
0.310.232518.6