| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | NuminaMath | Accuracy60.9 | 39 | |
| Mathematical Reasoning | NuminaMath | Accuracy78.67 | 20 | |
| Mathematical Reasoning | NuminaMath | Math Accuracy62.45 | 18 | |
| Mathematical Reasoning | NuminaMath (val) | Accuracy21.7 | 8 | |
| LLM Inference | NuminaMath | Throughput (RPS)12.8 | 6 | |
| Reward Prediction | NuminaMath (out-of-domain) | Accuracy42.22 | 6 | |
| Formal Theorem Proving | NuminaMath-LEAN (total) | Accuracy51 | 4 | |
| Formal Theorem Proving | NuminaMath LEAN (unsolved) | Accuracy26 | 4 | |
| Formal Theorem Proving | NuminaMath-LEAN (solved-H) | Accuracy47 | 4 | |
| Formal Theorem Proving | NuminaMath-LEAN (solved-K) | Accuracy100 | 4 | |
| Formal Theorem Proving | NuminaMath LEAN (In-domain) | Average Token Cost1,707.19 | 4 | |
| Mathematical Reasoning | NuminaMath subset of 5,000 samples | Accuracy73.94 | 3 |