| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical reasoning | MathQA | Accuracy90 | 95 | |
| Math Word Problem solving | MathQA (test) | Accuracy81.5 | 34 | |
| Mathematical Reasoning | MathQA (test) | Accuracy87.6 | 33 | |
| Correctness Prediction | MathQA | Accuracy66.15 | 18 | |
| Question Answering | MathQA (test) | Accuracy81.05 | 16 | |
| Question Answering | MathQA | Accuracy78.7 | 12 | |
| Math Programming | MathQA Python | Pass@8087.4 | 8 | |
| Zero-shot Reasoning | MathQA | Accuracy28.4 | 7 | |
| Downstream Task | MathQA | Accuracy24.32 | 7 | |
| Numerical Question Answering | MathQA (test) | Program Accuracy83 | 6 | |
| Common Sense Reasoning | MathQA | Accuracy64 | 4 | |
| Code Generation | MathQA Python Original (test) | Pass@8084.7 | 4 | |
| Human Evaluation | MathQA | Accuracy89.2 | 3 | |
| Code Generation | MathQA Python Filtered (dev) | PASS@120.7 | 3 |