| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | WeMath | Accuracy80.6 | 75 | |
| Visual Mathematical Reasoning | WeMath | Accuracy98.7 | 53 | |
| Multimodal Reasoning | WeMath | Accuracy63.8 | 43 | |
| Multimodal Math Reasoning | WeMath | Accuracy78 | 26 | |
| Step-wise Verification | WeMath | Macro F163.9 | 18 | |
| Mathematical multi-modal reasoning | WeMath | Pass@185.11 | 13 | |
| First Incorrect Step Identification | WeMath | FISI F1 Score24.9 | 6 |