| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | DeepMind-Mathematics | Accuracy88.4 | 63 | |
| Mathematical Reasoning | DeepMind-Mathematics (test) | Accuracy64.1 | 27 | |
| Mathematical Reasoning | DeepMind-Mathematics | Pass@187.1 | 22 | |
| Open-form mathematical reasoning | DeepMind MATHEMATICS | Exact-match Accuracy42.13 | 7 |