Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BrUMO

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningBRUMO 2025
Accuracy93.3
52
Mathematical ReasoningBrUMO '25
Rank1
40
Mathematical ReasoningBRUMO (DEF.)
Pass@12866.67
30
ReasoningBRUMO 2025
Accuracy60.83
21
ReasoningBRUMO25
Avg@k Score78.75
20
Mathematical ReasoningBRUMO
Trace Count826
20
ReasoningBrumo 25
Trace Count613
20
Mathematical ReasoningBRUMO25
Pass@154.4
18
Mathematical ReasoningBrumo
Pass@1 Accuracy81.9
15
ReasoningBrUMO25
Pass@194.58
14
Mathematical ReasoningBRUMO
Accuracy80
12
Mathematical ReasoningBRUMO 2025
PASS@169.48
11
Mathematical ReasoningBrumo 2025
Accuracy54.79
10
Math ReasoningBRUMO 2025
Pass@130.4
8
Mathematical ReasoningBrumo 2025
Pass@196.4
8
Mathematical ReasoningBRUMO 2025
Avg@32 Score63.3
8
Math ReasoningBRUMO'25
Pass@147.39
4
Mathematical ReasoningBRUMO 2025
Majority@204823.3
4
Mathematical ReasoningBRUMO
REST Score73.3
4
Mathematical ReasoningBRUMO 2025 (test)
Pass@1 Accuracy56.66
4
Mathematical ReasoningBRUMO 2025
Pass@186.2
3
Math ReasoningBRUMO 2025
Pass@14.79
2
Mathematical ReasoningBRUMO 2025
Pass@451.42
2
Ranking Method EvaluationBrUMO 25
Mean Kendall's tau_b0.954
1
Ranking Correlation AnalysisBrUMO'25
Kendall's tau_b (vs. Gold Standard)0.858
1
Showing 25 of 25 rows