| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MATH | RM-Regen | Accuracy80.6 | 46 | 25d ago | |
| Word Sorting | SELF-THOUGHT | Acc@t11 | 24 | 1mo ago | |
| GSM8K | RM-Regen | Accuracy92.6 | 23 | 25d ago | |
| BBEH Mini | Accuracy17.8 | 11 | 10d ago | ||
| Algorithmic Reasoning Suite Unseen Length (test) | sin/cos (Randomized) | Even Pairs100 | 11 | 1mo ago | |
| CLRS | MPNN | BFS Success Rate99.8 | 9 | 1mo ago | |
| Big-Bench Hard Word Sorting and Multi-step Arithmetic (test) | StrategyLLM | WS Accuracy80 | 7 | 1mo ago | |
| MOST RELIABLE PATH 100 nodes | NE++ | Key Accuracy579 | 6 | 1mo ago | |
| MOST RELIABLE PATH 50 nodes | NE++ | Key Identification Accuracy3.04 | 6 | 1mo ago | |
| MOST RELIABLE PATH 20 nodes | NE | Key Accuracy17.3 | 6 | 1mo ago | |
| BELLMAN-FORD 100 nodes | NE | Key Value1,980,000 | 6 | 1mo ago | |
| BELLMAN-FORD 50 nodes | NE | Key Path Identification Count59 | 6 | 1mo ago | |
| BELLMAN-FORD 20 nodes | NE++ | Key Value0.0025 | 6 | 1mo ago | |
| CLRS-30 n=64 (test) | FloydNet | Sort Accuracy100 | 6 | 1mo ago | |
| Algorithmic Tasks Length Generalization, l=41-120 1.0 (test) | minGRU | PC0.07 | 5 | 1mo ago | |
| CLRS-30 (test) | Hint-ReLIC | Kruskal MST Accuracy96.01 | 5 | 1mo ago | |
| 86 Algorithmic Reasoning Tasks (overall) | PRIME | Average Accuracy93.8 | 2 | 1mo ago |