| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| FOLIO | VERGE Full | Accuracy89.2 | 119 | 3d ago | |
| LogiQA | Accuracy50.23 | 98 | 3d ago | ||
| BBH | UPA | Accuracy100 | 93 | 3d ago | |
| LogiQA (test) | Accuracy86 | 92 | 3d ago | ||
| ReClor (test) | IDOL | Accuracy80.6 | 87 | 3d ago | |
| LogiQA | Qwen3-8B-thinking | Accuracy80.4 | 84 | 3d ago | |
| ProofW | Denser | Accuracy83.7 | 80 | 3d ago | |
| FOLIO (test) | HBLR | Accuracy95.6 | 58 | 3d ago | |
| StrategyQA | Accuracy89 | 58 | 3d ago | ||
| LogiQA | Denser | LogiQA Accuracy78.9 | 56 | 3d ago | |
| CounterBench (test) | FLEx | Accuracy88.9 | 55 | 3d ago | |
| LogiQA (val) | GPT-4-0125-preview | Accuracy58.37 | 50 | 3d ago | |
| ZebraLogic | Accuracy98.8 | 48 | 3d ago | ||
| ReClor (dev) | FOCAL REASONER | Accuracy0.786 | 46 | 3d ago | |
| Sudoku | LLaDA-8B-Instruct w/ DAM | Accuracy89.2 | 44 | 3d ago | |
| LogiQA (dev) | FOCAL REASONER | Accuracy47.3 | 40 | 3d ago | |
| ReClor Hard (test) | Human Performance | Accuracy87.2 | 37 | 3d ago | |
| ProntoQA (test) | HBLR | Accuracy99.72 | 36 | 3d ago | |
| ProofWriter (test) | HBLR | Accuracy92.32 | 36 | 3d ago | |
| ProofWriter | PoT | Accuracy98.4 | 32 | 3d ago | |
| LogiQA-2 | Accuracy83.8 | 30 | 3d ago | ||
| ReClor Easy (test) | FOCAL REASONER | Accuracy86.4 | 28 | 3d ago | |
| BBH (test) | FAA | Top@1 Accuracy88.29 | 27 | 3d ago | |
| ReClor | Accuracy60 | 25 | 3d ago | ||
| AR-LSAT | VERGE Full | Accuracy91.7 | 24 | 3d ago |