Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot ReasoningReasoning Tasks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) Zero-shot
BoolQ Accuracy (Zero-shot)82.813
55
Zero-shot ReasoningZero-Shot Reasoning Tasks (ARC-C, ARC-E, BoolQ, Hella, OBQA, PIQA, SIQA, Wino)
ARC-C Accuracy65.53
54
ReasoningReasoning Tasks Average
Average Score68.6
32
Zero-shot EvaluationReasoning tasks
Reasoning Accuracy70.7
7
Showing 4 of 4 rows