MMLU, ScienceQA, GSM8K, and HumanEval

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Continual instruction tuning	MMLU, ScienceQA, GSM8K, and HumanEval	Average Accuracy74.6		3

Showing 1 of 1 rows