Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Common-sense ReasoningCommonsense Reasoning Benchmarks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot
Avg Accuracy70.185
63
Commonsense ReasoningCommonsense Reasoning Benchmarks zero-shot LLaMA-2-13B
BoolQ Accuracy (Zero-shot)80.92
17
Commonsense ReasoningCommonsense Reasoning Benchmarks Aggregate
Score71.9
12
Commonsense ReasoningCommonsense Reasoning Benchmarks (PIQA, ARC, HS, WG, BoolQ, MMLU) zero-shot
PIQA Accuracy (Zero-shot)82.1
10
Commonsense ReasoningCommonsense Reasoning Benchmarks (HellaSwag, WinoGrande, BoolQ)
HellaSwag73.6
5
Showing 5 of 5 rows