Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense Reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test)
BoolQ Accuracy88
138
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)
BoolQ Accuracy74.6
61
Commonsense ReasoningCommonsense Reasoning
Accuracy85
44
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, HellaSwag, Winogrande) zero-shot
Avg Commonsense Accuracy84.9
34
Commonsense ReasoningCommonsense Reasoning (PIQA, WinoG., HellaS., BoolQ, SIQA, OBQA) (test)
PIQA Accuracy89.9
32
Commonsense ReasoningCommonsense Reasoning (OBQA, ARC-C, Wino, PIQA, Social, ARC-E, BoolQ, Hella)
OBQA94.8
24
Commonsense ReasoningCommonsense Reasoning LLaMA2-7B
Average Accuracy79.68
18
Commonsense ReasoningCommonsense Reasoning Tasks (ARC-C, ARC-E, HellaSwag, LAMBADA, PIQA, WinoGrande)
ARC-C Accuracy41.47
13
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, OBQA) LLaMA2 7B backbone (test)
BoolQ Accuracy88.5
10
Showing 9 of 9 rows