Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PandaLM

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM-as-a-JudgePandaLM Human Annotations (test)
Agreement0.7683
13
LLM EvaluationPandaLM
Accuracy78.98
12
Showing 2 of 2 rows