Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU, ScienceQA, GSM8K, and HumanEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Continual instruction tuningMMLU, ScienceQA, GSM8K, and HumanEval
Average Accuracy74.6
3
Showing 1 of 1 rows