| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Helpfulness | Alpaca Eval | Alpaca Eval (%)17.77 | 22 | |
| Chat Performance | Alpaca-Eval | Score55.8 | 6 | |
| Instruction Following | Alpaca-Eval (test) | Length-Controlled Winrate66.85 | 6 | |
| Instruction Following | Alpaca Eval 0-shot | Comparison Score (CS)0.554 | 4 |