| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | SelfInst | Rouge-L21.7 | 73 | |
| Instruction Following | SelfInst | R-L Score23.4 | 50 | |
| Instruction-tuning | SelfInst | ROUGE-L21.31 | 21 | |
| Instruction Following Evaluation | SelfInst Out-of-Distribution | GPT-4o Score51.6 | 17 | |
| Generation | SelfInst (test) | LLM-as-a-Judge Score60.16 | 2 |