Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Knowledge Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Knowledge Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
MMLU
W16A16-Direct
MMLU Accuracy
82.03
64
1d ago
Natural Questions (NQ) (Evaluation)
GRADEpre
Accuracy
83
45
1mo ago
C-Eval (test)
TopoPrior+ARG
Natural Sciences Score
93.02
36
15d ago
IKP
GPT-5.5
Accuracy (IKP)
71.9
30
1mo ago
KMMLU, KMMLU Redux, KMMLU Pro, CLIcK, KoBALT, MMLU Pro, GPQA Diamond
DeepSeek-V3.1
Accuracy
85.1
21
3mo ago
MMLU-Redux
Isotonic Regression
Brier Score
0.1083
18
3mo ago
ArabicMMLU
Karnak
Accuracy
81.23
10
2mo ago
OALL v2
Karnak
Accuracy
77.44
9
2mo ago
M MMLU_c
ML
Accuracy (MMLU_c)
29.87
7
1mo ago
Include_c
ML (15%)
Accuracy
37.8
7
1mo ago
GMMLU c
MKC-e
Accuracy
32
7
1mo ago
Cmmlu_c
ML (Q3)
Accuracy
36.88
7
1mo ago
SuperGPQA Continual
STOC
Accuracy
15.85
6
22d ago
SuperGPQA (Original)
STOC
Accuracy
11.01
6
22d ago
MMLU-Redux 2.0 (Continual)
STOC
Accuracy
33.49
6
22d ago
MMLU-Redux 2.0 (Original)
STOC
Accuracy
42.03
6
22d ago
MMLU (Continual)
STOC
Accuracy
32.03
6
22d ago
Winogrande (Evaluation)
Disagreement
Accuracy
58
6
3mo ago
WikiText (eval)
Disagreement
BPB
0.777
6
3mo ago
PopQA (Evaluation)
GAME-LoRA
Accuracy
11.2
6
3mo ago
MMLU STEM
TSD-KD
Accuracy
49.7
5
2mo ago
Overall Knowledge Aggregation (Aggregate)
CAD
Improvement (%)
40
5
3mo ago
Composite (MMLU, MMLU-Pro, CMMLU, C-EVAL, GAOKAO-Bench, ARC-c, GPQA, SciBench, PHYBench, TriviaQA)
Ling-mini-2.0
Overall Average Score
65.77
4
3mo ago
MMLU-Pro (test)
T2M
Accuracy (%)
58.84
2
1mo ago
Showing 24 of 24 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs