Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Downstream Task Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot EvaluationDownstream Task Suite (ARC-C, BoolQ, HellaSwag, MMLU, OBQA, PIQA, RTE, WinoGrande) zero-shot Qwen1.5-MoE-A2.7B
ARC-C Accuracy45
6
Language Understanding and ReasoningDownstream Task Suite (PIQA, ARC-e, HellaSwag, GPQA, Lambada, MMLU, BBH)
PIQA50.67
2
Showing 2 of 2 rows