Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-Judge on PandaLM Human Annotations (test)

0.7683Agreement

FairJudge-8B

0.2801240.4068620.53360.660338Feb 6, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.76830.71870.72540.7218
2026.02
0.7280.66250.6360.6458
2026.02
0.71530.67750.59990.6168
2026.02
0.71340.6430.62180.63
2026.02
0.70050.63690.6480.6379
2026.02
0.70020.69860.57690.5955
2026.02
0.69310.64650.61020.6218
2026.02
0.66770.63830.71950.6392
2026.02
0.66670.65070.68670.6227
2026.02
0.65820.44560.48930.4624
2026.02
0.64760.58480.61130.5883
2026.02
0.37130.40110.39240.2898
2026.02
0.29890.43160.41670.2828