Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BrUMO

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningBrUMO '25
Rank1
40
ReasoningBRUMO 2025
Accuracy60.83
21
Mathematical ReasoningBRUMO
Trace Count826
20
ReasoningBrumo 25
Trace Count613
20
Mathematical ReasoningBRUMO25
Pass@154.4
18
ReasoningBrUMO25
Pass@194.58
14
Mathematical ReasoningBRUMO 2025
PASS@169.48
11
Mathematical ReasoningBRUMO
Accuracy67.5
7
Mathematical ReasoningBRUMO 2025 (test)
Pass@1 Accuracy56.66
4
Mathematical ReasoningBRUMO 2025
Pass@451.42
2
Ranking Method EvaluationBrUMO 25
Mean Kendall's tau_b0.954
1
Ranking Correlation AnalysisBrUMO'25
Kendall's tau_b (vs. Gold Standard)0.858
1
Showing 12 of 12 rows