| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Reasoning | Big-Bench Hard (BBH) (val) | Accuracy43.46 | 36 | |
| Word Sorting | Big-Bench Hard Word Sorting | Success Rate79.8 | 4 | |
| Counting | Big-Bench Hard Counting | Success Rate91.9 | 4 | |
| Temporal Reasoning | BIG-bench Hard Temporal Sequences (test) | Test Accuracy62 | 4 |