Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Alpaca Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
HelpfulnessAlpaca Eval
Alpaca Eval (%)17.77
22
Chat PerformanceAlpaca-Eval
Score55.8
6
Instruction FollowingAlpaca-Eval (test)
Length-Controlled Winrate66.85
6
Instruction FollowingAlpaca Eval 0-shot
Comparison Score (CS)0.554
4
Showing 4 of 4 rows