Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU GSM8K

Benchmarks

Task NameDataset NameSOTA ResultTrend
Utility EvaluationMMLU, GSM8K
MMLU Accuracy70.4
16
ReasoningMMLU GSM8K
MMLU Accuracy72.93
15
Showing 2 of 2 rows