Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Zero-shot Evaluation benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Zero-shot Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Downstream Tasks Zero-shot
FP16
Accuracy
76
278
4d ago
Downstream Tasks MMLU, PIQA, Arc-E, Arc-C, Wino, OpenQA
Dense
MMLU
77.62
218
4d ago
PIQA, WinoGrande, HellaSwag, ARC (Easy and Challenge), LAMBADA (test)
FP16
Average Accuracy
77.07
90
4d ago
Eight datasets average
Surrogate-Assisted Layer Contribution Estimation
Accuracy
63.54
87
4d ago
ArcC, ArcE, HS, PiQA, WG (test val)
FP16 Baseline
Average Accuracy
76
61
4d ago
Various Tasks Average (test)
FP16
Average Accuracy
79.95
38
4d ago
Evaluation Tasks Zero-shot Aggregate
FP16
Avg. Accuracy
75.41
33
4d ago
Tasks Zero-shot (mean)
Dense
mAcc
76.57
25
4d ago
OpenCLIP 38 datasets
Ensemble Teacher
Average Performance
67.3
22
4d ago
StableEval 27 evals
ACED-F2
Average Performance
70.9
21
4d ago
6 zero-shot downstream tasks
Dense
Average Accuracy
80.05
19
4d ago
DCLM CORE V2
Union (Random)
CORE_V2 Score
48
17
4d ago
Zero-shot Evaluation Suite
BF16
ARCC
60.5
14
4d ago
lm-evaluation-harness (SciQ, ARC-E, ARC-C, LogiQA, OBQA, BoolQ, HellaSwag, PIQA, WinoGrande) zero-shot
DsDm
SciQ Accuracy
68.2
13
4d ago
DataComp full (test)
TuneCLIP
Mean Zero-shot Accuracy
63.47
8
4d ago
Reasoning tasks
FP16
Reasoning Accuracy
70.7
7
4d ago
Downstream NLU tasks 1.0 (eval)
Random Sampling KD (Ours 12+)
0-shot Score
57.9
6
4d ago
Zero-shot Downstream Tasks (Arc-e, PIQA, Hellaswag, OpenBookQA, Winogrande, MMLU, BoolQ) Llama-1B Benchmark Suite (test)
Peri-LN
Arc-e Accuracy
31.63
5
4d ago
Non-reasoning tasks
FP16
Accuracy (Zero-shot Non-reasoning)
70.8
4
4d ago
GPT-3 Evaluation Suite (LAMBADA, TriviaQA, WebQs, PIQA, RACE-h, BoolQ) 1.3B various (test val)
GPT-3 1.3B (Original)
Overall Accuracy
44.4
3
4d ago
Showing 20 of 20 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Terms of Service
FAQs