SOTA General Capability Evaluation benchmarks and papers with code

Benchmarks

Dataset Name	SOTA Method	Metric
tinyBenchmarks	AAS	AI2_arc Accuracy90	48	5mo ago
General Capability Suite MMLU, GSM8K, HumanEval, IFEval	NOVA	Common Average Score77.78	39	1mo ago
General Capability Suite ARC-C, HellaSwag, MMLU, GSM8K		ARC-C Accuracy54.27	27	1mo ago
General Capability Suite	TELLME	Average Score71	12	1mo ago
Capability Benchmarks	GCWM	Score74.32	10	2mo ago
General Capability Dataset		General Score66.8	10	1mo ago
Voicebench	Kimi-Audio	HS Score76.91	8	4mo ago
Tülu General Benchmarks 3		MMLU45	6	4mo ago
Average (MMLU, GSM8K, MBPP)	Baseline	Accuracy78.84	2	4mo ago

Showing 9 of 9 rows