Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WeMath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal Math ReasoningWeMath
Accuracy98.7
168
Mathematical ReasoningWeMath
Accuracy80.6
161
Multimodal ReasoningWeMath
Accuracy72.2
129
Visual Mathematical ReasoningWeMath
Accuracy98.7
127
Step-wise VerificationWeMath
Macro F163.9
18
Multimodal Mathematical ReasoningWeMath (test)
Accuracy72.15
17
Mathematical multi-modal reasoningWeMath
Pass@185.11
13
Multimodal Mathematical ReasoningWeMath mini (test)
Accuracy72.6
12
Visual Mathematical ReasoningWeMath Loose
Score79
10
Multimodal Scientific ReasoningWeMath
Accuracy71.77
8
First Incorrect Step IdentificationWeMath
FISI F1 Score24.9
6
Showing 11 of 11 rows