| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reverse Chain-of-Thought Generation | MultiChallenge | Score45 | 20 | |
| Instruction Following | MultiChallenge | Score65.98 | 10 | |
| General-purpose Behavior | MultiChallenge | Score58.6 | 7 | |
| Medical Instruction Following | MultiChallenge | Pass@166.8 | 4 |