| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multilingual Mathematical Reasoning | PolyMath (test) | Accuracy (Ar)20.3 | 30 | |
| Translator-call mode selection | PolyMath High | Macro F173.74 | 12 | |
| Translator-call mode selection | PolyMath Medium | Macro F175.22 | 12 | |
| Translator-call mode selection | PolyMath Low | Macro F1 Score87.64 | 12 | |
| Mathematical Reasoning | PolyMath | Accuracy20.9 | 12 | |
| Mathematical Reasoning | PolyMath medium ALL 1.0 (test) | Accuracy25.24 | 12 | |
| Mathematical Reasoning | PolyMath medium (EN) 1.0 (test) | Accuracy32 | 12 | |
| Mathematical Reasoning | PolyMath English | Pass@144 | 9 | |
| Mathematical Reasoning | PolyMath Full | Accuracy (ar)18.9 | 7 | |
| Multilingual Reasoning | Polymath Low | Accuracy (en)96.5 | 3 | |
| Mathematical Reasoning | Polymath Low | Accuracy (en)96.8 | 3 |