| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Deductive logical reasoning | ProverQA hard (test) | Error Rate0 | 12 | |
| Linguistically Diverse Reasoning | ProverQA | Accuracy (Easy)94 | 8 | |
| Logical Reasoning | ProverQA hard split | Accuracy0.686 | 8 | |
| Deductive logical reasoning | ProverQA OOD hard subset 500 records (test) | Error Rate- | 0 |