Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Downstream Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Downstream EvaluationDownstream Suite Zero-shot (ARC-E, ARC-C, HellaS., PIQA, WG, OBQA, SciQ, BoolQ)
ARC-Easy Accuracy74.82
26
Zero-shot Downstream AccuracyDownstream Suite Zero-shot (BoolQ, HellaSwag, PIQA, RACE, WinoGrande)
BoolQ Accuracy82.4
19
Zero-shot Question Answering and ReasoningDownstream Suite Zero-shot (PIQA, HS, ARC, WG, RTE, OQA, BoolQ)
PIQA Accuracy80.79
12
General EvaluationDownstream Suite
Average Score39.38
8
Downstream Task EvaluationDownstream Suite (BoolQ, PIQA, HS, WG, ARC-e, ARC-c, OBQA) Zero-shot
Accuracy (BoolQ)77.7
5
Showing 5 of 5 rows