Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

We-Math

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningWe-Math mini (test)
Accuracy86.4
31
Math ReasoningWe-Math
Avg Pass@885.6
26
Multi-step mathematical reasoningWe-Math (test)
S1 Score72.8
20
Mathematical & Geometric ReasoningWe-Math
Accuracy@877.7
16
Showing 4 of 4 rows