Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU, GSM8K, HumanEval, BBH

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language Model CapabilityMMLU, GSM8K, HumanEval, BBH Combined
Average Score68.42
8
Showing 1 of 1 rows