| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | BBEH | Accuracy58.9 | 21 | |
| General Reasoning | BBEH | Accuracy78.8 | 19 | |
| Reasoning | BBEH (test) | Accuracy34.5 | 14 | |
| LLM Routing | BBEH (val) | Top-1 Acc66.4 | 14 | |
| LLM Routing | BBEH | Top-1 Accuracy66.4 | 14 | |
| Reasoning | BBEH | pass@115.31 | 11 | |
| Adding Mistake | BBEH | AOC67.2 | 7 | |
| Truncated CoT Answering | BBEH | AOC0.665 | 7 |