| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Math & Logic | MUSR | MUSR Performance42.12 | 24 | |
| Reasoning | MuSR (test) | Accuracy73.9 | 14 | |
| Multistep Soft Reasoning | MUSR | Accuracy (%)43.1 | 12 | |
| Reasoning | MuSR | Accuracy71.89 | 11 | |
| Multi-hop Reasoning | MuSR | Accuracy43.12 | 10 | |
| Adding Mistake | MuSR | AOC0.731 | 7 | |
| Truncated CoT Answering | MuSR | AOC33.6 | 7 | |
| Multistep Reasoning | MUSR | Accuracy61.67 | 7 | |
| Multistep Reasoning | MUSR-fr | Average Score33.79 | 6 |