Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Ability Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language UnderstandingGeneral Ability Suite (MMLU, PIQA, ARC-E, ARC-C, BoolQ, WinoGrande, HellaSwag, TruthfulQA)
MMLU Accuracy65
20
Commonsense Reasoning and Knowledge Question AnsweringGeneral Ability Suite (ARC, HellaSwag, PIQA, BoolQ, WinoGrande, COPA, OBQA, SciQ) various (test)
ARC-C Accuracy36.4
19
General Language UnderstandingGeneral Ability Suite (C-QA, T-QA, LAM, MMLU, L-Code)
Average Score48.1
16
Commonsense Reasoning and Knowledge Question AnsweringGeneral Ability Suite ARC, HellaSwag, PIQA, BoolQ, WinoGrande, COPA, OBQA, SciQ
ARC-C Accuracy-
0
Showing 4 of 4 rows