| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Math Reasoning | WeMath | Accuracy98.7 | 168 | |
| Mathematical Reasoning | WeMath | Accuracy80.6 | 161 | |
| Multimodal Reasoning | WeMath | Accuracy72.2 | 129 | |
| Visual Mathematical Reasoning | WeMath | Accuracy98.7 | 127 | |
| Step-wise Verification | WeMath | Macro F163.9 | 18 | |
| Multimodal Mathematical Reasoning | WeMath (test) | Accuracy72.15 | 17 | |
| Mathematical multi-modal reasoning | WeMath | Pass@185.11 | 13 | |
| Multimodal Mathematical Reasoning | WeMath mini (test) | Accuracy72.6 | 12 | |
| Visual Mathematical Reasoning | WeMath Loose | Score79 | 10 | |
| Multimodal Scientific Reasoning | WeMath | Accuracy71.77 | 8 | |
| First Incorrect Step Identification | WeMath | FISI F1 Score24.9 | 6 |