Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BoolQ, ARC, WinoGrande, HellaSwag

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Language UnderstandingBoolQ, ARC-e, ARC-c, WinoGrande, HellaSwag
ARC-e Accuracy83.08
8
Natural Language ReasoningBoolQ, ARC-e, ARC-c, WinoGrande (WinoG), HellaSwag (HelloS)
BoolQ Accuracy75.2
4
Showing 2 of 2 rows