Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common Sense Reasoning Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommon Sense Reasoning Tasks
Avg Score93
316
Common Sense ReasoningCommon Sense Reasoning Tasks (ARC-C, ARC-E, BoolQ, HellaSwag, PIQA, WinoGrande) zero-shot
Average Accuracy (Zero-Shot)77.67
92
Common-sense Reasoning5 common-sense reasoning tasks Llama-3-8B
Average Accuracy87.07
27
Common-Sense ReasoningCommon-Sense Reasoning Tasks
Wiki PPL16
18
Common-sense ReasoningCommon-sense reasoning tasks (ARC-C, ARC-E, HellaSwag, Lambada, PIQA, WinoGrande) (test)
ARC-C Accuracy44.88
16
Common-sense Reasoning5 common-sense reasoning tasks Llama-2-70B
Average Accuracy72.41
15
Common-sense Reasoning5 common-sense reasoning tasks Llama-2-13B
Accuracy67.81
15
Common-sense Reasoning5 common-sense reasoning tasks Llama-3-70B
Average Accuracy75.33
9
Showing 8 of 8 rows