Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BoolQ, PIQA, HellaSwag, WinoGrande, ARC, OBQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningBoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA (test)
BoolQ Accuracy73.54
4
Showing 1 of 1 rows