| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | IFEval | Accuracy (0-100)94 | 292 | |
| Instruction Following | IFEval (test) | IFEval Score81.17 | 45 | |
| Reverse Chain-of-Thought Generation | IFEval | Accuracy85.1 | 20 | |
| Alignment | IFEval strict prompt | pass@190.2 | 16 | |
| Constraint-following Instruction Evaluation | IFEval | Average Score54.4 | 16 | |
| Chat | IFEval | Loose Prompt Metric48.8 | 15 | |
| Instruction Following | IFEval standard (test) | Punctuation Score100 | 10 | |
| Alignment | IFEval | IFEval Score86.3 | 10 | |
| Instruction Following | IFEval | Pass@1 (Strict)64.93 | 8 | |
| Verifiable Instruction Following | IFEval (test) | Prompt Loose Accuracy75.23 | 7 | |
| Instruction Following Evaluation | IFEval | IFEval Score69.9 | 6 | |
| Personalized Interaction | IFEval Synthetic | Personalization Score6.58 | 6 | |
| Thai output consistency | IFEval-TH | IFEval-TH Score99.4 | 6 | |
| Other | IFEval | Score86.69 | 6 | |
| Agent & Alignment | IFEval strict-prompt | Score83.73 | 5 | |
| Instruction Following | IFEval v1 (test) | Accuracy72.1 | 4 | |
| Instruction Following Evaluation | IFEval (dev) | Accuracy92 | 3 | |
| Instruction Following | IFEval 1.0 (full) | Pass@1446 | 2 | |
| Instruction Following | IFEval strict instance | Accuracy25.06 | 2 | |
| Instruction Following | IFEval TH | Overall Score80.47 | 2 | |
| Instruction Following | IFEval EN | Score87.64 | 2 |