| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | IFBench | Pass@1 (Strict)82.9 | 72 | |
| Instruction Following | IFBench | Accuracy76.5 | 33 | |
| Instruction Following | IFBench | IFBench Score43.28 | 27 | |
| Instruction Following | IFBench | Prompt-level Accuracy34.69 | 21 | |
| Instruction Following | IFBench | IFBench Score75.4 | 19 | |
| Reward Modeling | IFBench | Accuracy69.3 | 17 | |
| Reward Modeling | IFBench Hard | Accuracy78 | 16 | |
| Reward Modeling | IFBench Normal | Accuracy80.5 | 16 | |
| Reward Modeling | IFBench Simple | Accuracy87.2 | 16 | |
| Instruction Following | IFBench | IFBench Score39.46 | 12 | |
| Instruction Following | IFBench | Exact Match (EM)65 | 7 | |
| Alignment | IFBench | pass@141.7 | 7 | |
| Reward Modeling | IFBench (test) | Accuracy57.9 | 7 | |
| Instruction Following | IFBench | Pass@127 | 6 | |
| Instruction Following | IFBench | Genuine Followup Rate9.7 | 6 | |
| General Task (Agentic Coding) | IFBench | Score77.1 | 6 | |
| Instruction Following | IFBench (test) | Score38.61 | 5 | |
| Instruction Following | IFBench Strict | Avg@1031.5 | 2 |