| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LogiQA | Denser | LogiQA Accuracy78.9 | 251 | 14d ago | |
| BBH | UPA | Accuracy100 | 249 | 7d ago | |
| LogiQA (test) | Accuracy86 | 151 | 1mo ago | ||
| Sudoku | EGSPO-SA | Accuracy94.3 | 142 | 9d ago | |
| Formal Logic | UAB | Accuracy87.8 | 136 | 7d ago | |
| FOLIO | VERGE Full | Accuracy89.2 | 126 | 23d ago | |
| LogiQA-2 | Accuracy83.8 | 116 | 1mo ago | ||
| LogicVista | Accuracy81.4 | 113 | 14d ago | ||
| LogiQA | Qwen3-8B-thinking | Accuracy80.4 | 100 | 2mo ago | |
| LogiQA | Accuracy50.23 | 98 | 3mo ago | ||
| ZebraLogic v1.0 (test) | In-place | Cell Accuracy97.7 | 90 | 5d ago | |
| ZebraLogic (test) | In-place | Grid Accuracy92.2 | 90 | 5d ago | |
| ReClor (test) | IDOL | Accuracy80.6 | 87 | 3mo ago | |
| ProofW | Denser | Accuracy83.7 | 80 | 3mo ago | |
| HLE | Accuracy0.7226 | 62 | 1mo ago | ||
| AR-LSAT | VERGE Full | Accuracy91.7 | 60 | 1mo ago | |
| FOLIO (test) | HBLR | Accuracy95.6 | 58 | 3mo ago | |
| StrategyQA | Accuracy89 | 58 | 3mo ago | ||
| ProntoQA (test) | HBLR | Accuracy99.72 | 57 | 8d ago | |
| ProofWriter (test) | HBLR | Accuracy92.32 | 57 | 8d ago | |
| Stepgame k=10 | LLM-ASP | Accuracy88.1 | 56 | 2mo ago | |
| Stepgame k=4 | LLM-ASP | Accuracy93.8 | 56 | 2mo ago | |
| Stepgame k=3 | PoT-LLM | Accuracy89.5 | 56 | 2mo ago | |
| CounterBench (test) | FLEx | Accuracy88.9 | 55 | 3mo ago | |
| SLR-BENCH Extended Leaderboard | LRL Score15.5 | 54 | 21d ago |