| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Game of 24 | Accuracy758 | 103 | |
| Mathematical Reasoning | Game of 24 (test) | Accuracy98 | 35 | |
| Arithmetic reasoning (multi-solution) | Game of 24 4nu 137 (test) | Multi Solution Accuracy76.25 | 12 | |
| Explorative Reasoning | Game of 24 (test) | Accuracy80 | 11 | |
| Arithmetic Reasoning | Game of 24 | Performance85.3 | 11 | |
| Arithmetic Reasoning | Game of 24 95 (test) | Success Rate100 | 9 | |
| Game of 24 | Game of 24 100 tasks GPT-4 | Success Rate74 | 8 | |
| Arithmetic Reasoning | Game of 24 (test) | Success Rate90 | 7 | |
| Mathematical Reasoning | Game of 24 | pass@10.84 | 6 | |
| Arithmetic Planning | Game of 24 | Accuracy86 | 4 | |
| Reasoning | Game of 24 | Inference Time (s)60.93 | 4 | |
| Mathematical Reasoning | Game of 24 | pass@1100 | 4 |