Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common Sense Reasoning Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommon Sense Reasoning Tasks
Avg Score93
321
Common Sense ReasoningCommon Sense Reasoning Tasks (ARC-C, ARC-E, BoolQ, HellaSwag, PIQA, WinoGrande) zero-shot
Average Accuracy (Zero-Shot)77.67
92
Zero-shot Reasoning9 Common Sense Reasoning Tasks (WinoGrande, SocialIQA, LAMBADA, MMLU, ARC-Easy, ARC-Challenge, HellaSwag, OpenBookQA, PIQA) Average
Accuracy72.7
65
Common-sense Reasoning5 common-sense reasoning tasks Llama-3-8B
Average Accuracy87.07
27
Common-Sense ReasoningCommon-Sense Reasoning Tasks
Wiki PPL16
18
Common-sense ReasoningCommon-sense reasoning tasks (ARC-C, ARC-E, HellaSwag, Lambada, PIQA, WinoGrande) (test)
ARC-C Accuracy44.88
16
Common-sense Reasoning5 common-sense reasoning tasks Llama-2-70B
Average Accuracy72.41
15
Common-sense Reasoning5 common-sense reasoning tasks Llama-2-13B
Accuracy67.81
15
Common-sense Reasoning5 common-sense reasoning tasks Llama-3-70B
Average Accuracy75.33
9
Common-sense ReasoningCommon-sense reasoning tasks WikiText, LAMBADA, PIQA, HellaSwag, WinoGrande, ARC
PPL (WikiText)25.89
8
Showing 10 of 10 rows