Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Zero-shot Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Zero-shot Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Downstream Tasks Zero-shot
FP16
Accuracy
76
278
3mo ago
Downstream Tasks MMLU, PIQA, Arc-E, Arc-C, Wino, OpenQA
Dense
MMLU
77.62
218
3mo ago
Eight datasets average
Surrogate-Assisted Layer Contribution Estimation
Accuracy
63.54
112
8d ago
Zero-shot Tasks Average
FP16
Accuracy
68.05
95
1mo ago
PIQA, WinoGrande, HellaSwag, ARC (Easy and Challenge), LAMBADA (test)
FP16
Average Accuracy
77.07
90
3mo ago
9 diverse tasks zero-shot
FloatingPoint
Average Accuracy (Zero-shot)
73.81
85
7d ago
Zero-shot Evaluation Suite (ARC, Hellaswag, LAMBADA, PIQA, Winogrande)
FP16
ARC-C
73.39
85
14d ago
Evaluation Tasks Zero-shot Aggregate
Baseline
Avg. Accuracy
79.95
74
14d ago
6 zero-shot downstream tasks
Dense
Average Accuracy
80.05
70
14d ago
ArcC, ArcE, HS, PiQA, WG (test val)
FP16 Baseline
Average Accuracy
76
61
3mo ago
ARC-Easy, ARC-Challenge, OpenBookQA, WinoGrande, PIQA, HellaSwag, MathQA, RTE, BoolQ zero-shot
Original (Dense)
Mean Accuracy
71.08
59
2mo ago
ARC-C, ARC-E, BoolQ, HellaSwag, PIQA, RTE, WinoGrande Zero-shot
Dense
Accuracy
72.9
57
1mo ago
7 tasks zero-shot
BTC-LLM
Mean Accuracy (Zero-shot)
72.79
55
12d ago
Zero-shot Evaluation Suite (ARC-c, ARC-e, HellaSwag, PIQA, WinoGrande)
FP16
ARC-c Accuracy
50.3
52
2d ago
Various Tasks Average (test)
FP16
Average Accuracy
79.95
38
3mo ago
Evaluation Benchmarks Zero-shot
Dense
Average Accuracy
71.76
34
2d ago
Downstream Tasks PiQA ARC Hellaswag Winogrande BoolQ
FP16
PiQA Accuracy (Zero-shot)
84.4
30
1mo ago
Eight tasks zero-shot
Dense
Accuracy (Zero-shot)
60.31
29
1mo ago
Zero-shot Tasks
bfloat16
Task Avg Score
73.99
26
15d ago
ARC-Challenge, ARC-Easy, BoolQ, CrowS-Pairs, OpenBookQA, PIQA, RACE, SiQA, TruthfulQA, WinoGrande zero-shot
HFPrune
ARC-C Accuracy
52.1
26
2mo ago
Tasks Zero-shot (mean)
Dense
mAcc
76.57
25
3mo ago
MobileLLM Evaluation Suite zero-shot
EDGERAZOR
ARC-e
69.19
23
27d ago
OpenCLIP 38 datasets
Ensemble Teacher
Average Performance
67.3
22
3mo ago
StableEval 27 evals
ACED-F2
Average Performance
70.9
21
3mo ago
Reasoning Benchmarks Zero-shot (BoolQ, PIQA, HellaSwag, WinoGrande, ARC)
FP16
BoolQ Accuracy (Zero-shot)
71.1
20
2mo ago
Showing 25 of 53 rows
25 / page
50 / page
100 / page
1
2
3
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs