| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | AlpacaEval 2.0 | LC Win Rate3,526 | 281 | |
| Instruction Following | AlpacaEval | Win Rate97.2 | 125 | |
| LLM alignment evaluation | AlpacaEval 2 | LC Win Rate49.5 | 72 | |
| Instruction Following | AlpacaEval 2.0 (test) | LC Win Rate (%)59.93 | 71 | |
| Instruction Following and Helpfulness Evaluation | AlpacaEval 2.0 | Win Rate49.4 | 58 | |
| LLM Alignment Evaluation | AlpacaEval 2.0 (test) | LC Win Rate30.35 | 51 | |
| Chat | AlpacaEval 2.0 (test) | AlpacaEval (LC win %)57.46 | 46 | |
| Open-ended Generation | AlpacaEval 2.0 | Win Rate648 | 43 | |
| Instruction Following | AlpacaEval (test) | Helpfulness Score3,213 | 32 | |
| General Performance | AlpacaEval | Winrate98 | 25 | |
| Chat | AlpacaEval | Win Rate3,213 | 25 | |
| Chat Evaluation | AlpacaEval LC 2 | Score74.11 | 23 | |
| Open-ended Generation | AlpacaEval 1.0 | Win Rate7,904 | 23 | |
| Open-ended | AlpacaEval | Win Rate vs Davinci-00393.5 | 22 | |
| Instruction Following | AlpacaEval Yoruba | Win Rate (%)68.9 | 20 | |
| Instruction Following | AlpacaEval Swahili | Win Rate83 | 20 | |
| Instruction Following | AlpacaEval Indonesian | Win Rate64.2 | 20 | |
| Instruction Following | AlpacaEval Korean | Win Rate77.8 | 20 | |
| Instruction Following | AlpacaEval German | Win Rate65.2 | 20 | |
| Instruction Following | AlpacaEval Chinese | Win Rate70.4 | 20 | |
| Instruction Following | AlpacaEval Length-controlled | Score73.9 | 16 | |
| Instruction Following | AlpacaEval v1 (test) | AlpacaEval Score97.7 | 14 | |
| Instruction-following | AlpacaEval 805 instructions (test) | Win Rate79.91 | 14 | |
| Instruction Following | AlpacaEval LC 2 | Win Rate80.9 | 12 | |
| Instruction Following | AlpacaEval Helpsteer2 2 (test) | LC Win Rate29.64 | 12 |