Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA General Language Understanding benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
General Language Understanding
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
tinyBenchmark
No Steering
Accuracy (ARC)
77.51
81
2mo ago
GLUE
Full FT
Accuracy
92.5
75
22d ago
GLUE v1 (test dev)
BOMF (Ours)
MNLI
87.86
40
3mo ago
MMLU
Origin
MMLU Score
73.59
39
1mo ago
Standard Downstream Tasks Suite (SciQ, PIQA, WinoGrande, ARC-E, ARC-C, HellaSwag, LogiQA, BoolQ, LAMBADA, MMLU)
ConceptLM
Average Accuracy
48.3
32
3mo ago
MMLU
Qwen3-4B
MMLU Accuracy
72.45
29
7d ago
Average
EMoE
Average Accuracy
72.93
26
21d ago
General LLM Benchmarks (ARC-C, CSQA, HellaSwag, LAMBADA, MMLU, OpenBookQA, PIQA, Winogrande) (test)
Original
ARC-C Accuracy
59.5
22
3mo ago
General Ability Suite (MMLU, PIQA, ARC-E, ARC-C, BoolQ, WinoGrande, HellaSwag, TruthfulQA)
LRC
MMLU Accuracy
65
20
5d ago
12-task evaluation suite (test)
Efficient-DLM 8B
Average Score
71.62
20
3mo ago
C-Eval (val)
Qwen-1.5 14B (Teacher)
Accuracy
78.68
18
3mo ago
Held-out capability suite (test)
Base
AIME-2024 Accuracy
62.9
16
15d ago
Overall LLM Evaluation Suite PiQA, ARC, HellaSwag, WinoGrande, MMLU v1
LLaMA-3-8B-Lizard
Overall Accuracy
74.6
16
1mo ago
General Ability Suite (C-QA, T-QA, LAM, MMLU, L-Code)
Base
Average Score
48.1
16
3mo ago
10 Benchmarks Average (test)
Base
Accuracy (Average)
67.4
15
27d ago
8 Sub-Tasks (test)
LoRA
Performance on 8 Sub-Tasks
62.3
14
2mo ago
NLP Evaluation Suite (SciQ, PIQA, WG, ARC, HellaSwag, LogiQA, BoolQ, LAMBADA)
MOUE L48
SciQ Accuracy
58.3
14
2mo ago
CMMLU
NBDiff-7B-BASE
Overall Accuracy
77.3
14
3mo ago
All tasks (25 tasks) (val)
polish-roberta-8k
Overall Accuracy
85.93
13
2mo ago
General Language Tasks Suite (WikiText-2, MMLU, PIQA, HellaSwag, WinoGrande, ARC-Challenge) standard (various)
BF16
PPL
4.88
13
3mo ago
MMLU
Qwen2.5-14B
Accuracy
71.6
12
1mo ago
Open LLM Leaderboard HuggingFace 2023a (test)
LLaMA-2-13B (Zero-shot)
ARC-c Accuracy (25-shot)
59.4
12
2mo ago
Winogrande, HellaSwag, ARC, MMLU Consolidated
Teacher (DeepSeek-V2-Lite)
Average Accuracy
71.09
11
6d ago
BIG-bench Mimicked
Llama 3-8b
Sports Score
99.7
11
3mo ago
BIG-bench Original
Claude 2.1
Sports Score
99.4
11
3mo ago
Showing 25 of 58 rows
25 / page
50 / page
100 / page
1
2
3
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs