Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test)
BoolQ Accuracy88
238
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)
BoolQ Accuracy89.69
223
Commonsense ReasoningCommonsense Reasoning
Accuracy85
57
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, HellaSwag, Winogrande) zero-shot
Avg Commonsense Accuracy84.9
34
Commonsense ReasoningCommonsense Reasoning (PIQA, WinoG., HellaS., BoolQ, SIQA, OBQA) (test)
PIQA Accuracy89.9
32
Visual ReasoningCommonsense Reasoning
Jaccard Index (J)8
30
Commonsense ReasoningCommonsense Reasoning
BoolQ Accuracy76.5
29
Commonsense ReasoningCommonsense Reasoning
BoolQ Accuracy75.1
27
Commonsense ReasoningCommonsense Reasoning Tasks (ARC-C, ARC-E, HellaSwag, LAMBADA, PIQA, WinoGrande)
ARC-C Accuracy41.47
25
Commonsense ReasoningCommonsense Reasoning
WinoGrande Accuracy (WG)80.66
24
Commonsense ReasoningCommonsense Reasoning (OBQA, ARC-C, Wino, PIQA, Social, ARC-E, BoolQ, Hella)
OBQA94.8
24
Zero-shot Commonsense ReasoningCommonsense Reasoning PIQA HellaSwag WinoGrande ARC-Easy OpenBookQA MathQA (test)
Zero-shot Accuracy59
21
Commonsense ReasoningCommonsense Reasoning (test)
BoolQ Accuracy70.13
21
Commonsense ReasoningCommonsense Reasoning
ARC-E Accuracy81.19
20
Commonsense ReasoningCommonsense Reasoning LLaMA2-7B
Average Accuracy79.68
18
Commonsense ReasoningCommonsense Reasoning Task
HellaSwag Accuracy53.63
12
Commonsense ReasoningCommonsense Reasoning 8 datasets
BoolQ Accuracy73.6
11
Agentic RoutingCommonsense Reasoning (CS)
Accuracy82.7
10
Commonsense ReasoningCommonsense Reasoning (lm-evaluation-harness) zero-shot
LAMBADA Perplexity11.86
10
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, OBQA) LLaMA2 7B backbone (test)
BoolQ Accuracy88.5
10
Commonsense ReasoningCommonsense Reasoning MC+PEFT LLaMA3.2-1B (test)
BoolQ Accuracy62.4
8
Commonsense ReasoningCommonsense Reasoning (OpenBookQA, ARC-E, ARC-C, WinoGrande, PIQA, MathQA, HellaSwag)
OpenBookQA34
7
Commonsense ReasoningCommonsense Reasoning LLaMA-3.2-3B-Instruct (test)
ARC-c76.1
6
Commonsense ReasoningCommonsense Reasoning (HellaSwag, OBQA, WinoGrande, ARC, PIQA)
HellaSwag52.3
5
Commonsense ReasoningCommonsense Reasoning Tasks HellaSwag, PIQA, WinoGrande
HellaSwag Accuracy33.9
4
Showing 25 of 27 rows