| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | MultiChallenge (Out-of-Domain) | Overall Score38.5 | 23 | |
| Reverse Chain-of-Thought Generation | MultiChallenge | Score45 | 20 | |
| Instruction Following | MultiChallenge | Score65.98 | 10 | |
| General-purpose Behavior | MultiChallenge | Score58.6 | 7 | |
| Multi-turn Dialogue Reasoning | MultiChallenge | Accuracy32.97 | 4 | |
| Medical Instruction Following | MultiChallenge | Pass@166.8 | 4 | |
| Relational Understanding | MultiChallenge | IM Score20.35 | 2 |