Language Reasoning

Benchmarks

Dataset Name	SOTA Method	Metric
BBH (BIG-Bench Hard)	PREFPO	Average BBH Score87.5	20	1mo ago
TruthfulQA		Accuracy40.15	12	4mo ago
Language Reasoning Average		Accuracy73.25	12	4mo ago
DeepAccident-CCoT (val)	C-CoT	Accuracy84.2	6	2mo ago
LangR unseen tasks (test)	SGE	Pass@160.8	3	4mo ago

Showing 5 of 5 rows