| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | IFBench | Accuracy77.8 | 72 | |
| Instruction Following | IFBench | Pass@1 (Strict)82.9 | 72 | |
| Instruction Following | IFBench | IFBench Score76 | 56 | |
| General instruction following | IFBench | Accuracy84.1 | 30 | |
| Instruction Following Evaluation | IFBench | Score73.2 | 23 | |
| Instruction Following | IFBench | Prompt-level Accuracy34.69 | 21 | |
| Instruction Following | IFBench | IFBench Score79.79 | 19 | |
| Instruction Following | IFBench | IFBench Score75.4 | 19 | |
| Instruction Following | IFBench | Accuracy36 | 18 | |
| Reward Modeling | IFBench | Accuracy69.3 | 17 | |
| Instruction Following | IFBench (test) | Score55.95 | 16 | |
| Reward Modeling | IFBench Hard | Accuracy78 | 16 | |
| Reward Modeling | IFBench Normal | Accuracy80.5 | 16 | |
| Reward Modeling | IFBench Simple | Accuracy87.2 | 16 | |
| Instruction Following | IFBench-I | Accuracy73.96 | 15 | |
| Instruction Following | IFBench-P | Accuracy17.4 | 10 | |
| Instruction Following | IFBench | IFBench Score23.88 | 9 | |
| Instruction Following | IFBench | Pr. (S)57.3 | 8 | |
| Instruction following | IFBench | LLM Throughput (tokens/s)2,688 | 8 | |
| Instruction Following | IFBench | Exact Match (EM)65 | 7 | |
| Alignment | IFBench | pass@141.7 | 7 | |
| Reward Modeling | IFBench (test) | Accuracy57.9 | 7 | |
| Instruction Following | IFBench | Pass@127 | 6 | |
| Instruction Following | IFBench | Genuine Followup Rate9.7 | 6 | |
| General Task (Agentic Coding) | IFBench | Score77.1 | 6 |