Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test)
BoolQ Accuracy88
202
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)
BoolQ Accuracy82.88
129
Commonsense ReasoningCommonsense Reasoning
Accuracy85
44
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, HellaSwag, Winogrande) zero-shot
Avg Commonsense Accuracy84.9
34
Commonsense ReasoningCommonsense Reasoning (PIQA, WinoG., HellaS., BoolQ, SIQA, OBQA) (test)
PIQA Accuracy89.9
32
Commonsense ReasoningCommonsense Reasoning (OBQA, ARC-C, Wino, PIQA, Social, ARC-E, BoolQ, Hella)
OBQA94.8
24
Commonsense ReasoningCommonsense Reasoning LLaMA2-7B
Average Accuracy79.68
18
Commonsense ReasoningCommonsense Reasoning Tasks (ARC-C, ARC-E, HellaSwag, LAMBADA, PIQA, WinoGrande)
ARC-C Accuracy41.47
13
Commonsense ReasoningCommonsense Reasoning 8 datasets
BoolQ Accuracy73.6
11
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, OBQA) LLaMA2 7B backbone (test)
BoolQ Accuracy88.5
10
Commonsense ReasoningCommonsense Reasoning MC+PEFT LLaMA3.2-1B (test)
BoolQ Accuracy62.4
8
Commonsense ReasoningCommonsense Reasoning (OpenBookQA, ARC-E, ARC-C, WinoGrande, PIQA, MathQA, HellaSwag)
OpenBookQA34
7
Commonsense ReasoningCommonsense Reasoning (HellaSwag, OBQA, WinoGrande, ARC, PIQA)
HellaSwag52.3
5
Commonsense ReasoningCommonsense Reasoning Tasks HellaSwag, PIQA, WinoGrande
HellaSwag Accuracy33.9
4
Showing 14 of 14 rows