| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | TheoremQA | Accuracy43.1 | 55 | |
| Theorem-based Reasoning | TheoremQA | Score53 | 34 | |
| Mathematical Reasoning | TheoremQA (test) | Accuracy48.4 | 28 | |
| Mathematical Reasoning | TheoremQA | Pass@134.1 | 18 | |
| STEM Reasoning | TheoremQA | Avg@255.4 | 16 | |
| Question Answering | TheoremQA | Accuracy15 | 16 | |
| Reasoning | TheoremQA | AUROC88.87 | 14 | |
| Theorem Proving | TheoremQA | Accuracy13.5 | 13 | |
| Mathematical Problem Solving | TheoremQA TQ-Math | Exact Match Accuracy57.7 | 12 | |
| Retrieval-Augmented Generation | TheoremQA | Accuracy66.3 | 12 | |
| Theorem Question Answering | TheoremQA standard (test) | Accuracy56 | 12 | |
| Scientific Reasoning | TheoremQA (test) | Accuracy48.4 | 9 | |
| General Reasoning | TheoremQA | Average@236.3 | 7 | |
| Theorem-based Question Answering | TheoremQA | Accuracy56.13 | 7 |