General Reasoning

Benchmarks

Dataset Name	SOTA Method	Metric
MMLU-Pro	AFlow	Accuracy82.3	213	1mo ago
BBH	SABA	Accuracy93.2	190	1mo ago
MMLU	M2CL	MMLU Accuracy95.1	180	1mo ago
BBH		BBH General Reasoning Accuracy94.6	117	1mo ago
MMLU-Pro		pass@1 Accuracy73.44	115	1mo ago
Super GPQA	Gemini 2.5 Pro	Accuracy71.1	99	1mo ago
StratQA	Process Supervision	Accuracy87.8	91	4mo ago
BBEH		Accuracy78.8	76	1mo ago
Out-of-Distribution Performance Suite (ARC-c, GPQA*, MMLU-Pro) (test)	On-Policy	ARC-c Score91.4	73	1mo ago
BIG-Bench Hard	Qwen 3 VL 32B Think	Accuracy91.1	68	2mo ago
General Reasoning Suite Average		Pass@178.3	63	2mo ago
MMLU-Pro		MMLU-Pro General Reasoning Avg@8 Acc90.1	63	3mo ago
GPQA	Gemini 2.5 Pro	Accuracy86.4	59	1mo ago
GPQA Diamond	gemini-2.5-pro	Pass@1 Accuracy86.4	57	1mo ago
MMMU	GPT5	Overall Score85.4	57	22d ago
LiveBench	PerSyn	Accuracy53.47	55	1mo ago
MMLU	Pivot-SFT	Accuracy86.21	51	2mo ago
General Reasoning Suite MMLU Pro, Super GPQA, GPQA Diamond, BBEH		MMLU Pro84	47	1mo ago
Overall	DTSR	Accuracy84.8	40	3mo ago
MMLU-R	ROSA2	Accuracy (MMLU-R General Reasoning)84.4	40	1mo ago
GPQA	SRGen	pass@165.7	38	1mo ago
Out-of-Distribution Benchmarks MMLU-P, ARC-c, GPQA	POPO	MMLU-P Score52.1	37	1mo ago
BIG-bench	POES	Accuracy (General)81.6	36	3mo ago
Big-Bench Hard (BBH) (val)	TAIA	Accuracy43.46	36	4mo ago
GPQA-Diamond & MMLU-Pro	Scaf-GRPO	Accuracy53.6	35	2mo ago

Showing 25 of 163 rows