Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

We-Math

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-step mathematical reasoningWe-Math (test)
S1 Score72.8
20
Math ReasoningWe-Math
Pass@176.4
19
Mathematical & Geometric ReasoningWe-Math
Accuracy@877.7
16
Mathematical ReasoningWe-Math mini (test)
Accuracy66.1
13
Showing 4 of 4 rows