| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | FOLIO | Accuracy89.2 | 123 | |
| Logical Reasoning | FOLIO (test) | Accuracy95.6 | 58 | |
| Natural Language Inference | FOLIO | Accuracy0.61 | 26 | |
| NL-to-FOL Syntax Correctness | FOLIO (test) | Syntax Correctness Rate99 | 26 | |
| First-order logic formalization | FOLIO | Accuracy31.53 | 24 | |
| Mathematical Reasoning | FOLIO to GSM8K | Accuracy95.1 | 18 | |
| First-Order Logic Reasoning | FOLIO | Pass@1 Success Rate84.7 | 18 | |
| Binary Classification | FOLIO | Accuracy81 | 18 | |
| Logical Reasoning | FOLIO-wiki-curated (test) | Accuracy98.04 | 17 | |
| Explanation Refinement | FOLIO | Initial Score85.25 | 15 | |
| Deductive logical reasoning | FOLIO 203 (dev) | Exclusion Rate6.4 | 12 | |
| Logical Reasoning | FOLIO full expert-curated | Accuracy79.9 | 8 | |
| Adding Mistake | FOLIO | AOC0.714 | 7 | |
| Truncated CoT Answering | FOLIO | AOC0.35 | 7 | |
| First-Order Logic translation | FOLIO (test) | BLEU66 | 7 | |
| Logical Reasoning | FOLIO (val) | Accuracy69.12 | 5 | |
| Logical Reasoning | FOLIO FOL fields (val) | Accuracy68.1 | 4 | |
| Logical reasoning | FOLIO | Optimization-phase Token Usage453 | 3 | |
| Logical Reasoning | FOLIO | Accuracy48 | 2 |