Multi-benchmark Suite

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Language Modeling and Reasoning	Multi-benchmark Suite (AGIEval, GSM8K, MATH, Natural Questions, SimpleQA, TriviaQA, SuperGPQA) (cumulative)	AGIEval (EN)90.98		20

Showing 1 of 1 rows