Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU, CMMLU, GSM8k, XSum, and StrategyQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and Generative TasksMMLU, CMMLU, GSM8k, XSum, and StrategyQA (test)
MMLU Accuracy30.77
4
Reasoning and Generative TasksMMLU, CMMLU, GSM8k, XSum, and StrategyQA
MMLU-
0
Showing 2 of 2 rows