| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | CLUTRR | Accuracy95.9 | 42 | |
| Logical Reasoning | CLUTRR (test) | Accuracy80.1 | 35 | |
| Hybrid Reasoning | CLUTRR (test) | Accuracy76.4 | 24 | |
| Inductive Reasoning | Clutrr | Pass@195.5 | 18 | |
| Binary Classification | CLUTRR | Accuracy78 | 18 | |
| Logical Reasoning | CLUTRR rob_train_disc_23_all (test) | Accuracy41.6 | 3 | |
| Logical Reasoning | CLUTRR rob train irr 23 all (test) | Accuracy34.5 | 3 | |
| Logical Reasoning | CLUTRR rob_train_sup_23_all (test) | Accuracy45.2 | 3 | |
| Logical Reasoning | CLUTRR rob train clean 23 all (test) | Accuracy35.6 | 3 | |
| Logical Reasoning | CLUTRR gen_train234_test2to10 | Accuracy25 | 3 | |
| Logical Reasoning | CLUTRR gen_train23_test2to10 | Accuracy24 | 3 |