| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Game of 24 | Accuracy98.3 | 62 | |
| Mathematical Reasoning | Game of 24 (test) | Accuracy98 | 35 | |
| Arithmetic reasoning (multi-solution) | Game of 24 4nu 137 (test) | Multi Solution Accuracy76.25 | 12 | |
| Arithmetic Reasoning | Game of 24 95 (test) | Success Rate100 | 9 | |
| Game of 24 | Game of 24 100 tasks GPT-4 | Success Rate74 | 8 | |
| Mathematical Reasoning | Game of 24 | pass@10.84 | 6 | |
| Mathematical Reasoning | Game of 24 | pass@1100 | 4 | |
| Arithmetic Reasoning | Game of 24 | Metric- | 0 |