Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Downstream Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot EvaluationDownstream Tasks Zero-shot
Accuracy76
278
Zero-shot EvaluationDownstream Tasks MMLU, PIQA, Arc-E, Arc-C, Wino, OpenQA
MMLU77.62
218
Downstream Task11 Downstream Tasks Aggregate
Average Accuracy64.6
32
Language Modeling Downstream EvaluationDownstream tasks Average (test)
Average Score58.02
24
Zero-shot Evaluation6 zero-shot downstream tasks
Average Accuracy80.05
19
Diverse Language Understanding62 downstream tasks
Average Accuracy67.5
18
Utility EvaluationDownstream Tasks
Average Accuracy63.4
12
Zero-shot Task Evaluation9 Downstream Tasks Utility
Average Accuracy54.7
10
Multiple Choice EvaluationDownstream Tasks (ARC, HellaSwag, PIQA, Winogrande) Standard LLM Eval (val/test)
Average Accuracy56.98
9
Downstream Task Evaluation15 Downstream Tasks summary
Median EG2
7
Class-conditioned generationAverage of 8 downstream tasks
FID10.63
7
Zero-shot Downstream Task EvaluationDownstream Tasks zero-shot (Arc-c, Arc-e, BoolQ, COPA, MMLU, OBQA, PIQA, RTE, Winogrande)
Arc-c49.32
6
Data-Incremental LearningDownstream Tasks (test)
Accuracy62.35
6
Class-Incremental LearningDownstream Tasks (test)
Accuracy62.53
6
Zero-shot EvaluationZero-shot Downstream Tasks (Arc-e, PIQA, Hellaswag, OpenBookQA, Winogrande, MMLU, BoolQ) Llama-1B Benchmark Suite (test)
Arc-e Accuracy31.63
5
Zero-shot Learning7 Downstream Tasks Avg
Average Score64.53
4
Downstream Task EvaluationDownstream Tasks Aggregate
Accuracy54.37
3
Multiple Choice Question AnsweringDownstream Tasks (ARC-E, ARC-C, SciQ, PIQA, MMLU, CMMLU, CEVAL)
ARC-E Accuracy69.7
2
Showing 18 of 18 rows