| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GSM8K | DMoA | Accuracy98.87 | 254 | 1d ago | |
| GSM8K (test) | Agent-GWO | Accuracy95.9 | 250 | 14d ago | |
| AQuA | ProTeGi | Accuracy93.55 | 188 | 14d ago | |
| MATH | XRPO | Accuracy90.54 | 160 | 7d ago | |
| GSM8K | Accuracy (GSM8K)100 | 131 | 1d ago | ||
| GSM8K | S-GRPO | Accuracy93.8 | 126 | 3mo ago | |
| Gaokao En 2023 | Legislator-Executor (Ours) | Accuracy79 | 109 | 4d ago | |
| AMC23 | ReBalance | Pass@1 Accuracy100 | 99 | 7d ago | |
| AMC | LLM-J | Accuracy80 | 95 | 2mo ago | |
| SVAMP | SRQ-driven steering | Accuracy96.5 | 85 | 1mo ago | |
| MATH500 | CE-GPPO | Accuracy95.6 | 83 | 13d ago | |
| JEEBench | APRM | Accuracy74.4 | 82 | 2mo ago | |
| OlympiadBench | APRM | Accuracy90.7 | 76 | 13d ago | |
| GSM Hard | IPOMP | Accuracy82.6 | 73 | 7d ago | |
| HLE Math-100 | TMAS | Pass@135.84 | 68 | 21d ago | |
| IMO-AnswerBench 50 | TMAS | Pass@1 Accuracy40.5 | 68 | 21d ago | |
| MATH500 | UMAD | Pass@1 Rate87.2 | 66 | 11d ago | |
| MultiArith | POES | Accuracy98.3 | 65 | 1mo ago | |
| AIME 2025 | Accuracy94.6 | 60 | 16d ago | ||
| MATH 500 | DEPO | Accuracy94.4 | 60 | 1mo ago | |
| MATH (test) | MAD | Accuracy96 | 59 | 21d ago | |
| GSM8K | ReBalance | Pass@1 Accuracy96.8 | 57 | 22d ago | |
| AMC 2023 (test) | Clip-Higher + QAE | Pass@192.97 | 57 | 29d ago | |
| GSM8K | JURY-RL | Pass@4 Accuracy95.83 | 54 | 1mo ago | |
| MultiArith (test) | ZERA | Accuracy99.59 | 54 | 1mo ago |