| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| 200 IMO-level math problems IMO-AnswerBench, IMO-ProofBench, ArXivMath (test) | Meta-Harness | Pass@1 Accuracy50.6 | 36 | 18d ago | |
| GSM8K | Bifrost | Solve Rate90.22 | 27 | 1mo ago | |
| MWPBENCH (out-of-domain) | WizardMath-Mistral-RL | College Math Acc24.8 | 26 | 1mo ago | |
| AIME 24 | BERT-Judge | Accuracy90 | 24 | 5d ago | |
| Math Domain (AIME24, Math-OAI, Minerva, Olympiad, ACM23) Qwen2.5-7B (10% selection) | InstructDiff | AIME24 Score7.71 | 18 | 1mo ago | |
| MATH | Primitives-based MAS | Accuracy76.4 | 14 | 1mo ago | |
| Math Benchmarks LIMO curation (test) | LALP | Accuracy72.6 | 10 | 2d ago | |
| GSM8k, SAT-Math, & MATH OpenCompass AGIEval sampled (test) | CRITIQ | GSM8k Accuracy32.22 | 4 | 1mo ago |