Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PandaLM

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM-as-a-JudgePandaLM Human Annotations (test)
Agreement0.7683
13
LLM EvaluationPandaLM
Accuracy78.98
12
Showing 2 of 2 rows