| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | FOLIO | Accuracy89.2 | 119 | |
| Logical Reasoning | FOLIO (test) | Accuracy95.6 | 58 | |
| Natural Language Inference | FOLIO | Accuracy0.61 | 26 | |
| NL-to-FOL Syntax Correctness | FOLIO (test) | Syntax Correctness Rate99 | 26 | |
| First-Order Logic Reasoning | FOLIO | Pass@1 Success Rate84.7 | 18 | |
| Binary Classification | FOLIO | Accuracy81 | 18 | |
| Logical Reasoning | FOLIO-wiki-curated (test) | Accuracy98.04 | 17 | |
| Explanation Refinement | FOLIO | Initial Score85.25 | 15 | |
| Deductive logical reasoning | FOLIO 203 (dev) | Exclusion Rate6.4 | 12 | |
| Adding Mistake | FOLIO | AOC0.714 | 7 | |
| Truncated CoT Answering | FOLIO | AOC0.35 | 7 | |
| First-Order Logic translation | FOLIO (test) | BLEU66 | 7 | |
| Logical Reasoning | FOLIO (val) | Accuracy69.12 | 5 | |
| Logical reasoning | FOLIO | Optimization-phase Token Usage453 | 3 | |
| Logical Reasoning | FOLIO | Accuracy48 | 2 |