Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BoolQ, PIQA, HellaSwag, WinoGrande, ARC, OBQA, MTQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Language ReasoningBoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA, MTQA zero-shot
BoolQ Accuracy82.11
21
Showing 1 of 1 rows