Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Zero-shot Language Understanding benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Zero-shot Language Understanding
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
ARC-Easy, ARC-Challenge, HellaSwag, LAMBADA, PIQA lm-eval 0.4.11 (test)
BF16
Average Accuracy
81.5
42
2mo ago
ARC-c, ARC-e, PIQA, Winogrande, Hellaswag
RIA+SQ+VC+EBFT
Mean Accuracy
6,257
25
1mo ago
Evaluation Suite Zero-shot (LMB, HellA, PIQA, ARC-E, ARC-C, WINO, Open, MMLU)
BLT Distill (Teacher)
ARC-E Accuracy
83.4
25
3mo ago
Reasoning Suite Zero-shot (BoolQ, WinoG., PIQA, OBQA, HellaS., ARC-e, ARC-c)
SparseGPT
BoolQ Accuracy
82.63
24
2mo ago
Zero-shot Benchmarks
NeUQI
Average Zero-shot Accuracy
73.07
21
2d ago
LM Evaluation Harness Downstream Suite (HellaSwag, PIQA, WinoGrande, OpenBookQA, SIQA, BoolQ, TriviaQA, MMLU, ARC-Challenge, ARC-Easy, MathQA, SciQ)
Llama-3.1-8B-Instruct
HellaSwag Accuracy
72.52
18
1d ago
Qwen3-0.6B Zero-shot Evaluation Suite (AE, AC, SciQ, MM, MM-P, HS, OBQA, PIQA, RACE, WG, CSQA, AGI) (test)
AMO
Accuracy (AE)
68.64
15
15d ago
LLM Evaluation Suite MMLU, GSM8k, HellaSwag, WinoGrande
FP16
MMLU
72.8
12
3mo ago
BoolQ, ARC-e, ARC-c, WinoGrande, HellaSwag
MoEITS
ARC-e Accuracy
83.08
8
1mo ago
Commonsense Reasoning and Language Modeling Suite (PIQA, SIQA, HellaSwag, ARC-E, ARC-C, WinoGrande, LAMBADA) zero-shot
Dense
PIQA Accuracy
70.46
6
22d ago
Combined Zero-shot
FP16
Average Accuracy
63.71
4
1mo ago
LLM Evaluation Suite (AE, AC, SciQ, MM, MM-P, HS, OBQA, PIQA, RACE, WG, CSQA, AGI) zero-shot
AMO
AE Score
67.97
2
15d ago
Showing 12 of 12 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs