Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-Judge Evaluation Consistency on PreferenceBench

79.73Kappa

CalibraEval

57.390863.190468.9974.7896Oct 20, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.10
79.7397.2997.6
2024.10
79.4293.594.11
2024.10
58.5488.1789.43
2024.10
58.2586.2386.61