Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningCommonsense Reasoning Suite (test)
HellaSwag Accuracy0.9594
62
Commonsense ReasoningCommonsense Reasoning Suite
OpenBookQA Accuracy35
48
Commonsense ReasoningCommonsense Reasoning Suite BoolQ, PIQA, HellaS, WinoG, ARC-e, ARC-c, OBQA
Average Accuracy71.77
43
Commonsense ReasoningCommonsense Reasoning Suite BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c
BoolQ Accuracy87.49
43
Zero-shot Commonsense ReasoningCommonsense Reasoning Suite
BoolQ Accuracy73.18
32
Commonsense ReasoningCommonsense Reasoning Suite (ARC-e, OBQA, SIQA, ARC-c, WinoG., PIQA)
ARC-e Accuracy88
24
Commonsense ReasoningCommonsense Reasoning Suite (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQ) (test)
BoolQ Accuracy63.27
16
Commonsense ReasoningCommonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E) (test)
PiQA Accuracy80.79
15
Commonsense ReasoningCommonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E)
PiQA Accuracy82.21
15
Commonsense ReasoningCommonsense Reasoning Suite LM Eval Harness
LAMBADA51.8
13
Question AnsweringCommonsense Reasoning Suite (ARC-e, ARC-c, BoolQ, OBQA, PIQA) (test)
ARC-e77.7
8
Zero-shot Question AnsweringCommonsense Reasoning Suite (PIQA, WinoGrande, HellaSwag, ARC) Zero-shot Llama-2-70B
PIQA Accuracy (Zero-shot)82.7
7
Commonsense ReasoningCommonsense Reasoning Suite (Arc, Hellaswag, Obqa, Piqa, Race, Siqa, Winogrande) (test)
Arc-c26.54
4
Showing 13 of 13 rows