| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction following | General Domain AlpacaEval Arena-Hard LLaMA3-8B (10% selection) | AlpacaEval Score12.09 | 18 | |
| Chinese-to-English speech translation | General-domain (test) | BLEU40.77 | 6 | |
| Question Answering | General Domain Average | Average EM42.35 | 5 | |
| Language Modeling | General Domain (holdout test) | L_inf0.79 | 4 |