| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | ProofWriter | Accuracy98.4 | 44 | |
| Logical Reasoning | ProofWriter (test) | Accuracy92.32 | 36 | |
| Logical Reasoning | ProofWriter | Accuracy99.7 | 24 | |
| Logical Reasoning | ProofWriter | Accuracy68.2 | 22 | |
| Deductive Reasoning | ProofWriter | End-to-end Accuracy99.67 | 21 | |
| Deductive Reasoning | ProofWriter | Pass@197.4 | 18 | |
| Reasoning quality evaluation | PROOFWRITER | Somers' D0.339 | 15 | |
| Explanation Refinement | ProofWriter | Initial Score92 | 15 | |
| Reasoning | ProofWriter | Accuracy65 | 14 | |
| Logical Reasoning | ProofWriter (held-out) | Performance0.5483 | 14 | |
| Deductive logical reasoning | ProofWriter (test) | ExcRate100 | 12 | |
| Logical Reasoning | ProofWriter depth-5 OWA setting | Accuracy (ProofWriter d5 OWA)71.95 | 8 | |
| Deductive Reasoning | ProofWriter | Calibrated Accuracy92.1 | 8 | |
| Logical Reasoning | ProofWriter | Accuracy81.3 | 7 | |
| Deductive logical reasoning | ProofWriter 600 records (test) | Exc. Rate- | 0 |