| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy97.1 | 983 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy99 | 797 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy97.72 | 751 | |
| Mathematical Reasoning | GSM8K | Accuracy (GSM8K)97.8 | 358 | |
| Mathematical Reasoning | GSM8K | Accuracy97.04 | 351 | |
| Mathematical Reasoning | GSM8k | Accuracy96.21 | 212 | |
| Mathematical Reasoning | GSM8K | Speed Up (x)10.72 | 177 | |
| Mathematical Reasoning | GSM8K | Math Score96.4 | 171 | |
| Arithmetic Reasoning | GSM8K | Accuracy97.1 | 155 | |
| Math Reasoning | GSM8K (test) | Accuracy94.5 | 155 | |
| Arithmetic Reasoning | GSM8K (test) | Accuracy97.35 | 129 | |
| Math Reasoning | GSM8K | Accuracy93.8 | 126 | |
| Mathematical Reasoning | GSM8K | EM97.04 | 115 | |
| Mathematical Reasoning | GSM8K | pass@196.7 | 102 | |
| Math Word Problem Solving | GSM8K | Accuracy96.8 | 91 | |
| Math | GSM8K | Accuracy0.95 | 87 | |
| Reasoning | GSM8K | Accuracy1 | 83 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy89.2 | 79 | |
| Mathematical Reasoning | GSM8K (val) | Accuracy90.8 | 67 | |
| Mathematical Reasoning | GSM8K (test) | HS59.6 | 62 | |
| Mathematical Reasoning | GSM8K | Accuracy89.15 | 57 | |
| Mathematical reasoning | GSM8K | Tau ($ au$)5.39 | 54 | |
| Hallucination Detection | GSM8K | AUROC90.37 | 53 | |
| Math Word Problem Solving | GSM8K official 1.3k set (test) | Accuracy93.7 | 53 | |
| Hallucination Detection | GSM8K (test) | AUROC (Reference)79.01 | 48 |