Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WeMath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningWeMath
Accuracy80.6
225
Multimodal Math ReasoningWeMath
Accuracy98.7
211
Multimodal ReasoningWeMath
Accuracy72.2
171
Visual Mathematical ReasoningWeMath
Accuracy98.7
149
Image ReasoningWeMath
Accuracy71.3
34
Mathematical multi-modal reasoningWeMath
Pass@185.11
30
Mathematical ReasoningWeMath 525 samples
Accuracy78.5
24
Visual Mathematical ReasoningWeMath strict
Accuracy44.8
18
Multimodal Mathematical ReasoningWeMath mini (test)
Accuracy79.5
18
Step-wise VerificationWeMath
Macro F163.9
18
Multimodal Mathematical ReasoningWeMath (test)
Accuracy72.15
17
Visual ReasoningWeMath strict
Score39
12
Visual Mathematical ReasoningWeMath Loose
Score79
10
Multimodal Mathematical ReasoningWeMath
WeMath-S Score36.33
8
Multimodal Scientific ReasoningWeMath
Accuracy71.77
8
Mathematical reasoningWeMath-L
Score82.19
6
Mathematical reasoningWeMath-S
Score68.86
6
Multidisciplinary ReasoningWeMath
Accuracy61.6
6
Mathematical ReasoningWeMath loose
Accuracy52.1
6
First Incorrect Step IdentificationWeMath
FISI F1 Score24.9
6
Multimodal Mathematical ReasoningWeMath 19
Macro Average Score61.52
2
Showing 21 of 21 rows