Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Macro Average

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Task PerformanceMacro-average (Mathematics, Multi-Hop QA, Code Generation)
Accuracy69
21
Re-identificationMacro Average (Across Datasets)
AUC96.9
18
Mathematical ReasoningMacro Average AIME2024, MATH, Minerva, Olympiad-Bench
Pass@155
16
Mathematical ReasoningMacro Average Selected Benchmarks
Pass@1 (Avg@32)52.8
14
Question Answering and ReasoningMacro-average (MMLU, MATH, GSM8K, BBH)
Cost Reduction46
8
Showing 5 of 5 rows