| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hierarchical Reasoning | ListOps Long Range Arena (test) | Accuracy63.04 | 26 | |
| Hierarchical reasoning on symbolic sequences | Long ListOps (test) | Accuracy62.75 | 22 | |
| Sequence Classification | ListOps | Accuracy (%)43.2 | 13 | |
| Logical Expression Evaluation | ListOps-O Argument Generalization (Arguments 15) | Accuracy79 | 11 | |
| Logical Expression Evaluation | ListOps-O Argument Generalization (Arguments 10) | Accuracy0.8415 | 11 | |
| Logical Expression Evaluation | ListOps-O Length Generalization (Lengths 900-1000) | Accuracy99.5 | 11 | |
| Logical Expression Evaluation | ListOps-O Length Generalization (Lengths 500-600) | Accuracy99.4 | 11 | |
| Logical Expression Evaluation | ListOps-O Length Generalization (Lengths 200-300) | Accuracy99.9 | 11 | |
| Logical Expression Evaluation | ListOps-O near-IID (Lengths < 1000, Arguments < 5) | Accuracy99.9 | 11 | |
| Mathematical Expression Evaluation | ListOps Long Range Arena (test) | Accuracy41.4 | 7 | |
| Long-range sequence modeling | ListOps Long Range Arena (LRA) 2K (test) | Accuracy37.9 | 6 | |
| Long-range sequence modeling | ListOpsMix (test) | Accuracy70.43 | 5 | |
| Unsupervised Parsing | ListOps (test) | Accuracy68.07 | 5 | |
| Unsupervised Parsing | ListOps (val) | Accuracy67.65 | 5 | |
| Unsupervised Parsing | ListOps simplified (test) | Accuracy (Max)93.78 | 4 |