Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-benchmark Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language Modeling and ReasoningMulti-benchmark Suite (AGIEval, GSM8K, MATH, Natural Questions, SimpleQA, TriviaQA, SuperGPQA) (cumulative)
AGIEval (EN)90.98
20
Showing 1 of 1 rows