Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Ability Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense Reasoning and Knowledge Question AnsweringGeneral Ability Suite (ARC, HellaSwag, PIQA, BoolQ, WinoGrande, COPA, OBQA, SciQ) various (test)
ARC-C Accuracy36.4
19
General Language UnderstandingGeneral Ability Suite (C-QA, T-QA, LAM, MMLU, L-Code)
Average Score48.1
16
Commonsense Reasoning and Knowledge Question AnsweringGeneral Ability Suite ARC, HellaSwag, PIQA, BoolQ, WinoGrande, COPA, OBQA, SciQ
ARC-C Accuracy-
0
Showing 3 of 3 rows