Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DynaMath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningDynaMath
Accuracy81.42
127
Multimodal ReasoningDynaMath
Accuracy67.2
72
Visual Mathematical ReasoningDynaMath
Accuracy81.42
45
Multimodal Mathematical ReasoningDynaMath
Accuracy (DynaMath)69.2
40
Multimodal Mathematical ReasoningDynaMath
DynaMath Score66.4
31
Visual ReasoningDynaMath
Accuracy66.48
26
Mathematical ReasoningDynaMath
Pass@183.3
25
Mathematical ReasoningDynaMath DMath
Accuracy56.9
18
Step-wise VerificationDynaMath
Macro F166.7
18
Visual Mathematical ReasoningDynaMath worst
Score34.9
16
Mathematical & Geometric ReasoningDynaMath
Accuracy@873.1
16
Dynamic mathematical reasoningDynaMath (test)
Accuracy64.8
15
Multimodal Mathematical ReasoningDynaMath-W
Accuracy60.5
14
Mathematical Vision UnderstandingDynaMath
Accuracy66.43
12
Math ReasoningDynaMath
Worst Case Accuracy (WCA)26.8
11
STEM & PuzzleDynaMath (test)
Accuracy83.4
11
Mathematical ReasoningDynaMath
Avg@358.7
10
Visual Math ReasoningDynaMath
Pass@179
6
Multimodal Math ReasoningDynaMath Reasoning
Average Score (DynaMath)65.3
6
Math & ReasoningDynaMath worst
Score41.3
6
First Incorrect Step IdentificationDynaMath
FISI F1 Score26.7
6
Showing 21 of 21 rows