| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | AQUA | Accuracy85.05 | 132 | |
| Arithmetic Reasoning | AQuA (test) | Accuracy74.63 | 58 | |
| Hallucination Detection | AQuA | AUROC0.7822 | 31 | |
| Multiple-choice Question Answering | AQuA | Accuracy87.4 | 31 | |
| Arithmetic Reasoning | AQUA | Accuracy77.1 | 31 | |
| Symbolic Reasoning | AQUA | Accuracy80.3 | 26 | |
| Algebraic Reasoning | AQUA | Accuracy79.1 | 15 | |
| Mathematical Reasoning | AQUA (test) | Accuracy72.44 | 6 | |
| Arithmetic Reasoning | AQUA | Accuracy (format-specific prompt)33.5 | 2 | |
| Algebraic Reasoning | AQUA (test) | Accuracy- | 0 |