Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Zero-shot Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Zero-shot Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Downstream Tasks Zero-shot
FP16
Accuracy
76
278
1mo ago
Downstream Tasks MMLU, PIQA, Arc-E, Arc-C, Wino, OpenQA
Dense
MMLU
77.62
218
1mo ago
Zero-shot Tasks Average
FP16
Accuracy
68.05
95
4d ago
PIQA, WinoGrande, HellaSwag, ARC (Easy and Challenge), LAMBADA (test)
FP16
Average Accuracy
77.07
90
1mo ago
Eight datasets average
Surrogate-Assisted Layer Contribution Estimation
Accuracy
63.54
87
1mo ago
ArcC, ArcE, HS, PiQA, WG (test val)
FP16 Baseline
Average Accuracy
76
61
1mo ago
ARC-Easy, ARC-Challenge, OpenBookQA, WinoGrande, PIQA, HellaSwag, MathQA, RTE, BoolQ zero-shot
Original (Dense)
Mean Accuracy
71.08
59
1mo ago
ARC-C, ARC-E, BoolQ, HellaSwag, PIQA, RTE, WinoGrande Zero-shot
Dense
Accuracy
72.9
57
11d ago
Evaluation Tasks Zero-shot Aggregate
FP16
Avg. Accuracy
75.41
39
22d ago
Various Tasks Average (test)
FP16
Average Accuracy
79.95
38
1mo ago
6 zero-shot downstream tasks
Dense
Average Accuracy
80.05
31
8d ago
Downstream Tasks PiQA ARC Hellaswag Winogrande BoolQ
FP16
PiQA Accuracy (Zero-shot)
84.4
30
8d ago
ARC-Challenge, ARC-Easy, BoolQ, CrowS-Pairs, OpenBookQA, PIQA, RACE, SiQA, TruthfulQA, WinoGrande zero-shot
HFPrune
ARC-C Accuracy
52.1
26
1mo ago
Tasks Zero-shot (mean)
Dense
mAcc
76.57
25
1mo ago
OpenCLIP 38 datasets
Ensemble Teacher
Average Performance
67.3
22
1mo ago
StableEval 27 evals
ACED-F2
Average Performance
70.9
21
1mo ago
7 tasks zero-shot
BTC-LLM
Mean Accuracy (Zero-shot)
72.79
20
8d ago
Reasoning Benchmarks Zero-shot (BoolQ, PIQA, HellaSwag, WinoGrande, ARC)
FP16
BoolQ Accuracy (Zero-shot)
71.1
20
1mo ago
DCLM CORE V2
Union (Random)
CORE_V2 Score
48
17
1mo ago
Downstream tasks average
FP16
Avg Zero-shot Accuracy
81
16
8d ago
Zero-shot Evaluation Suite
BF16
ARCC
60.5
14
1mo ago
lm-evaluation-harness (SciQ, ARC-E, ARC-C, LogiQA, OBQA, BoolQ, HellaSwag, PIQA, WinoGrande) zero-shot
DsDm
SciQ Accuracy
68.2
13
1mo ago
Zero-shot Tasks
bfloat16
Task Avg Score
73.99
10
8d ago
Winogrande, OBQA, Hellaswag, Boolq, ARC, RTE (test)
BTC-LLM
Winogrande Accuracy
76.07
9
8d ago
Evaluation Suite (PiQA, LAMBADA, ARC, HellaSwag) zero-shot
Dense (full)
PiQA Accuracy
62.47
9
24d ago
Showing 25 of 37 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs