| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | IFEval | IFEval Accuracy95 | 625 | |
| Instruction Following | IFEval (test) | IFEval Score84.8 | 55 | |
| Instruction Following | IFEval | Genuine-Followup Rate53.97 | 38 | |
| Instruction Following | IFEval In-Domain | Precision (L)0.871 | 23 | |
| Instruction Following Evaluation | IFEval | IFEval Score86.69 | 20 | |
| Reverse Chain-of-Thought Generation | IFEval | Accuracy85.1 | 20 | |
| Instruction Following | IFEval loose | Avg@878.1 | 18 | |
| Alignment | IFEval strict prompt | pass@190.2 | 16 | |
| Instruction Following | IFEval | Pass@1 (Strict)82.4 | 16 | |
| Constraint-following Instruction Evaluation | IFEval | Average Score54.4 | 16 | |
| Chat | IFEval | Loose Prompt Metric48.8 | 15 | |
| Instruction Following | IFEval-PT | Instruction Score79.33 | 14 | |
| Instruction Following | IFEval Spanish (ES) | Strict Accuracy82.92 | 13 | |
| Instruction Following | IFEval Catalan (CA) | Strict Accuracy76.81 | 13 | |
| Instruction Following | IFEval Galician (GL) | Strict Accuracy79 | 13 | |
| Instruction Following | IFEval Basque (EU) | Strict Accuracy62.81 | 13 | |
| Instruction Following | IFEval EN | Score87.64 | 12 | |
| Instruction Following | IFEval English (test) | IFEval Accuracy93.16 | 10 | |
| Instruction Following | IFEval et | Instruction Level Strict Accuracy61.4 | 10 | |
| Instruction Following | IFEval standard (test) | Punctuation Score100 | 10 | |
| Alignment | IFEval | IFEval Score86.3 | 10 | |
| Verifiable Instruction Following | IFEval (test) | Prompt Loose Accuracy75.23 | 7 | |
| Personalized Interaction | IFEval Synthetic | Personalization Score6.58 | 6 | |
| Thai output consistency | IFEval-TH | IFEval-TH Score99.4 | 6 | |
| Other | IFEval | Score86.69 | 6 |