| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | BigBench Hard Boolean Expressions | Accuracy76.8 | 17 | |
| Reasoning | BigBench Hard Penguins | Accuracy44.1 | 5 | |
| Linguistic Reasoning | BigBench Hard Disambiguation QA | Accuracy55.1 | 5 | |
| Reasoning | BigBench-Hard collection averaged | Ours Accuracy45.41 | 4 |