| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | IFEval | IFEval Accuracy95 | 836 | |
| Instruction Following | IFEval | Accuracy (IFEval)90.39 | 89 | |
| Instruction Following | IFEval (test) | IFEval Score84.8 | 88 | |
| Instruction Following | IFEval | IFEval Score94.64 | 87 | |
| Instruction Following | IFEval | Genuine-Followup Rate87.5 | 65 | |
| Instruction Following | IFEval | Avg. Score (IFEval)88.92 | 45 | |
| Instruction Following | IFEval | Win Rate81.2 | 36 | |
| Dialogue | IFEval | IFEval80.2 | 34 | |
| Instruction Following | IFEval Inverse | Accuracy83.7 | 33 | |
| Instruction Following Evaluation | IFEval | IFEval Score86.69 | 32 | |
| Instruction Following | IFEval | IF-RT Score76.26 | 30 | |
| Instruction Following | IFEval v1 (test) | Accuracy79.85 | 28 | |
| Instruction Following | IFEval | Avg@k64.23 | 27 | |
| Alignment | IFEval strict prompt | pass@190.2 | 26 | |
| Instruction Following | IFEval | IFEval Score86.5 | 25 | |
| Instruction Following Evaluation | PPE-IFEval | Score76 | 24 | |
| Text generation | IFEval | Accuracy74.49 | 23 | |
| Instruction Following | IFEval In-Domain | Precision (L)0.871 | 23 | |
| Instruction Following | IFEval | Average Score47.67 | 21 | |
| Instruction Following | IFEval | Strict Accuracy90 | 21 | |
| Instruction Following | IFEval | IFScore84.63 | 21 | |
| Instruction Following | IFEval | Improvement Score16.57 | 20 | |
| Reverse Chain-of-Thought Generation | IFEval | Accuracy85.1 | 20 | |
| Instruction-following | IFEval | IFEval Score96.3 | 18 | |
| Instruction Following | IFEval | Score (%)62.48 | 18 |