PandaLM

Benchmarks

Task Name	Dataset Name	SOTA Result
LLM-as-a-Judge	PandaLM Human Annotations (test)	Agreement0.7683	13
LLM Evaluation	PandaLM	Accuracy78.98	12
Reward Modeling	PandaLM (test)	Accuracy79.42	5

Showing 3 of 3 rows