Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning and Knowledge Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and Knowledge EvaluationReasoning & Knowledge Suite (ARC-E, WG, SIQA, Hella., OBQA, CSQA, BA, MMLU)
ARC-Easy Accuracy51.05
15
Reasoning and KnowledgeReasoning and Knowledge Suite (MMLU, ARC-C, ARC-E, BoolQ, CSQA, HSwag, PIQA, SocIQ, Wino) (various)
MMLU75.78
14
Showing 2 of 2 rows