Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AlpacaEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingAlpacaEval 2.0
LC Win Rate3,526
281
Instruction FollowingAlpacaEval
Win Rate97.2
125
LLM alignment evaluationAlpacaEval 2
LC Win Rate49.5
72
Instruction FollowingAlpacaEval 2.0 (test)
LC Win Rate (%)59.93
71
Instruction Following and Helpfulness EvaluationAlpacaEval 2.0
Win Rate49.4
58
LLM Alignment EvaluationAlpacaEval 2.0 (test)
LC Win Rate30.35
51
ChatAlpacaEval 2.0 (test)
AlpacaEval (LC win %)57.46
46
Open-ended GenerationAlpacaEval 2.0
Win Rate648
43
Instruction FollowingAlpacaEval (test)
Helpfulness Score3,213
32
General PerformanceAlpacaEval
Winrate98
25
ChatAlpacaEval
Win Rate3,213
25
Chat EvaluationAlpacaEval LC 2
Score74.11
23
Open-ended GenerationAlpacaEval 1.0
Win Rate7,904
23
Open-endedAlpacaEval
Win Rate vs Davinci-00393.5
22
Instruction FollowingAlpacaEval Yoruba
Win Rate (%)68.9
20
Instruction FollowingAlpacaEval Swahili
Win Rate83
20
Instruction FollowingAlpacaEval Indonesian
Win Rate64.2
20
Instruction FollowingAlpacaEval Korean
Win Rate77.8
20
Instruction FollowingAlpacaEval German
Win Rate65.2
20
Instruction FollowingAlpacaEval Chinese
Win Rate70.4
20
Instruction FollowingAlpacaEval Length-controlled
Score73.9
16
Instruction FollowingAlpacaEval v1 (test)
AlpacaEval Score97.7
14
Instruction-followingAlpacaEval 805 instructions (test)
Win Rate79.91
14
Instruction FollowingAlpacaEval LC 2
Win Rate80.9
12
Instruction FollowingAlpacaEval Helpsteer2 2 (test)
LC Win Rate29.64
12
Showing 25 of 44 rows