Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MMLU, CMMLU, GSM8k, XSum, and StrategyQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and Generative TasksMMLU, CMMLU, GSM8k, XSum, and StrategyQA (test)
MMLU Accuracy30.77
4
Reasoning and Generative TasksMMLU, CMMLU, GSM8k, XSum, and StrategyQA
MMLU-
0
Showing 2 of 2 rows