Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DynaMath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningDynaMath
Accuracy81.42
75
Multimodal ReasoningDynaMath
Accuracy67.2
58
Visual Mathematical ReasoningDynaMath
Accuracy81.42
45
Multimodal Mathematical ReasoningDynaMath
Accuracy (DynaMath)69.2
28
Visual ReasoningDynaMath
Accuracy66.48
26
Mathematical ReasoningDynaMath DMath
Accuracy56.9
18
Step-wise VerificationDynaMath
Macro F166.7
18
Visual Mathematical ReasoningDynaMath worst
Score34.9
16
Mathematical & Geometric ReasoningDynaMath
Accuracy@873.1
16
Dynamic mathematical reasoningDynaMath (test)
Accuracy64.8
15
Multimodal Mathematical ReasoningDynaMath-W
Accuracy60.5
14
STEM & PuzzleDynaMath (test)
Accuracy83.4
11
Multimodal Mathematical ReasoningDynaMath
DynaMath Score66.4
10
Mathematical ReasoningDynaMath
Avg@358.7
10
Mathematical ReasoningDynaMath
Pass@151.01
9
Math & ReasoningDynaMath worst
Score41.3
6
First Incorrect Step IdentificationDynaMath
FISI F1 Score26.7
6
Showing 17 of 17 rows