Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BBH, GSM8K, MMLU, TruthfulQA, HumanEval, MBPP

Benchmarks

Task NameDataset NameSOTA ResultTrend
General CapabilityBBH, GSM8K, MMLU, TruthfulQA, HumanEval, MBPP
Average Score26.77
30
Showing 1 of 1 rows