Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense Reasoning Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Common-sense ReasoningCommonsense Reasoning Benchmarks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot
Avg Accuracy48.92
20
Commonsense ReasoningCommonsense Reasoning Benchmarks zero-shot LLaMA-2-13B
BoolQ Accuracy (Zero-shot)80.92
17
Commonsense ReasoningCommonsense Reasoning Benchmarks Aggregate
Score71.9
12
Showing 3 of 3 rows