Alpaca Eval

Benchmarks

Task Name	Dataset Name	SOTA Result
Helpfulness	Alpaca Eval	Alpaca Eval (%)90	42
Instruction Following Evaluation	Alpaca-Eval	Length-Controlled Win Rate62.17	8
Chat Performance	Alpaca-Eval	Score55.8	6
Instruction Following	Alpaca-Eval (test)	Length-Controlled Winrate66.85	6
Instruction Following	Alpaca Eval 0-shot	Comparison Score (CS)0.554	4

Showing 5 of 5 rows