Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-a-Judge on High-contrast response pairs

0.87Discriminability (πi)

LongCat-Flash-Chat

0.49560.59280.690.7872Apr 24, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
0.870.30710-
2026.04
0.860.0910-
2026.04
0.850.03510-
2026.04
0.85-0.04310-
2026.04
0.850.110-
2026.04
0.850.12410-
2026.04
0.84-0.22910-
2026.04
0.84-0.11710-
2026.04
0.84-0.15110-
2026.04
0.83-0.15210-
2026.04
0.820.22610-
2026.04
0.820.02410-
2026.04
0.810.15210-
2026.04
0.780.18110-
2026.04
0.77-0.0610-
2026.04
0.75-0.05210-
2026.04
0.55-0.0970.184-
2026.04
0.520.0950.381-
2026.04
0.51-0.0010.46-
2026.04
0.51-0.1020.457-