Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Macro Average

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Task PerformanceMacro-average (Mathematics, Multi-Hop QA, Code Generation)
Accuracy69
21
Re-identificationMacro Average (Across Datasets)
AUC96.9
18
Mathematical ReasoningMacro Average AIME2024, MATH, Minerva, Olympiad-Bench
Pass@155
16
Mathematical ReasoningMacro Average Selected Benchmarks
Pass@1 (Avg@32)52.8
14
RegressionMacro-average SICKR-STS, STS-B, WMT_RU_EN, WMT_EN_ZH, WMT_SI_EN (test)
Pearson Correlation (r)76.3
11
Question Answering and ReasoningMacro-average (MMLU, MATH, GSM8K, BBH)
Cost Reduction46
8
Showing 6 of 6 rows