Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MultiArith

Benchmarks

Task NameDataset NameSOTA ResultTrend
Arithmetic ReasoningMultiArith
Accuracy100
181
Mathematical ReasoningMultiArith
Accuracy100
116
Arithmetic ReasoningMultiArith (test)
Accuracy99.3
67
Mathematical reasoningMultiArith Out of Distribution
Top-1 Accuracy (Maj@1)100
30
Math ReasoningMultiArith
Accuracy98.3
14
Follow-up Questioning ConsistencyMultiArith (unseen)
Average Success Count18.33
12
Arithmetic ReasoningMultiArith
Accuracy (format-specific prompt)78.7
2
Showing 7 of 7 rows