| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | SVAMP | Accuracy97 | 403 | |
| Mathematical Reasoning | SVAMP (test) | Accuracy94 | 293 | |
| Performance Estimation | SVAMP | MAE0 | 198 | |
| Arithmetic Reasoning | SVAMP | Accuracy96.01 | 87 | |
| Math Reasoning | SVAMP | Accuracy96.5 | 85 | |
| Arithmetic Reasoning | SVAMP (test) | Accuracy98.16 | 70 | |
| Arithmetic Reasoning | SVAMP | Accuracy (Overall)93.7 | 54 | |
| Hallucination Detection | SVAMP | Mean AUROC80.6 | 50 | |
| Mathematical Reasoning | SVAMP out-of-domain (test) | Accuracy97 | 50 | |
| Mathematical Word Problem Solving | SVAMP | Accuracy96.94 | 38 | |
| Math Word Problem solving | SVAMP | Value Accuracy94.5 | 38 | |
| Mathematical Reasoning | SVAMP (val) | Accuracy85.1 | 36 | |
| Mathematical Reasoning | SVAMP | Pass@193.1 | 35 | |
| Speculative Sampling | SVAMP | Average Acceptance Length5.38 | 28 | |
| Group Collusive Attack Detection | SVAMP | Detection Accuracy92 | 27 | |
| Mathematical Reasoning | SVAMP 8-shot (test) | Accuracy92 | 25 | |
| Mathematical Reasoning | SVAMP | Avg Forward Passes5.7 | 24 | |
| Mathematical Reasoning | SVAMP | Pass@1 Accuracy94.2 | 22 | |
| Mathematical Reasoning | SVAMP | Accuracy94.31 | 21 | |
| Mathematical Reasoning | SVAMP | Accuracy (SVAMP)83.8 | 20 | |
| Mathematical Reasoning | SVAMP | AUROC0.6211 | 20 | |
| Math Word Problem solving | SVAMP English (test) | Accuracy67.8 | 20 | |
| Mathematical Reasoning | SVAMP | Accuracy92.33 | 19 | |
| Mathematical Reasoning | SVAMP | Pass@597 | 16 | |
| Mathematical Reasoning | SVAMP | Accuracy56.8 | 15 |