Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning and Generative Tasks on MMLU, CMMLU, GSM8k, XSum, and StrategyQA (test)
Loading...
30.77
MMLU Accuracy
GradMAP
25.206
26.6505
28.095
29.5395
Feb 16, 2026
MMLU Accuracy
CMMLU Accuracy
GSM8k Accuracy
XSum Score
StrategyQA Accuracy
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
CMMLU Accuracy
GSM8k Accuracy
XSum Score
StrategyQA Accuracy
Average Score
GradMAP
Backbone=Baichuan2-7B,...
2026.02
30.77
48.89
11.45
1.67
40.74
26.7
GradMAP
Backbone=Baichuan2-7B,...
2026.02
30.17
51.84
11.51
2.12
40.79
27.2
ShortGPT
Backbone=Baichuan2-7B
2026.02
28.59
40.62
7.89
1.67
20.13
19.78
SLEB
Backbone=Baichuan2-7B
2026.02
25.42
22.36
0.89
0.15
18.47
13.46
Feedback
Search any
task
Search any
task