Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common Sense Reasoning Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Common Sense ReasoningCommon Sense Reasoning Suite (PIQA, ARC-Easy, ARC-Challenge, BoolQ, HellaSwag, Winogrande) zero-shot (test dev)
PIQA79.35
30
Common Sense ReasoningCommon Sense Reasoning Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot
PIQA81.12
20
Common-sense ReasoningCommon-sense Reasoning Suite (PIQA, HellaSwag, ARC-C, ARC-E, OBQA) (test)
PIQA Accuracy84.1
18
Commonsense ReasoningCommon-sense Reasoning Suite Zero-shot
Average Accuracy (8 Tasks)52.92
10
Common Sense ReasoningCommon Sense Reasoning Suite ARC, BoolQ, RTE, Winogrande, TruthfulQA
ARC Challenge Accuracy34
8
Common-sense ReasoningCommon-sense Reasoning Suite
Average Score (8 tasks)42.91
6
Zero-shot task classificationCommon Sense Reasoning Suite (PIQA, HellaSwag, WSC, BoolQ, RACE-H) zero-shot
PIQA71.22
5
Showing 7 of 7 rows