| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | ASDiv | Accuracy0.955 | 221 | |
| Arithmetic reasoning | ASDiv | Accuracy93.5 | 54 | |
| Mathematical Reasoning | ASDiv (test) | Accuracy97.24 | 38 | |
| Mathematical reasoning | ASDiv Out of Distribution | Top-1 Accuracy (maj@1)89.1 | 35 | |
| Mathematical Reasoning | ASDiv | Pass@191.3 | 26 | |
| Mathematical Reasoning | ASDiv Aug (test) | Accuracy88.9 | 25 | |
| Mathematical Reasoning | ASDiv | AUROC66.91 | 20 | |
| Correctness Prediction | ASDiv | Accuracy96.74 | 18 | |
| Mathematical Reasoning | ASDiv-Aug | Accuracy92.14 | 15 | |
| Math Reasoning | ASDiv (held-out) | Performance75.52 | 14 | |
| Math Reasoning | ASDiv A (test) | Accuracy91 | 14 | |
| Math Word Problem solving | ASDiv-A (5-fold cross-val) | Accuracy87.5 | 7 | |
| Math Word Problem Reasoning | ASDiv | AUROC0.6818 | 6 | |
| Question Answering | ASDiv | AUROC66.91 | 6 | |
| Mathematical Reasoning | ASDIV | Solve Rate79.6 | 6 | |
| Arithmetic Reasoning | ASDiv (5-fold cross val) | Accuracy73.9 | 3 |