Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning and Generative Tasks on MMLU, CMMLU, GSM8k, XSum, and StrategyQA (test)
Loading...
30.77
MMLU Accuracy
GradMAP
25.206
26.6505
28.095
29.5395
Feb 16, 2026
MMLU Accuracy
CMMLU Accuracy
GSM8k Accuracy
XSum Score
StrategyQA Accuracy
Average Score
Updated 3d ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
CMMLU Accuracy
GSM8k Accuracy
XSum Score
StrategyQA Accuracy
Average Score
GradMAP
Backbone=Baichuan2-7B,...
2026.02
30.77
48.89
11.45
1.67
40.74
26.7
GradMAP
Backbone=Baichuan2-7B,...
2026.02
30.17
51.84
11.51
2.12
40.79
27.2
ShortGPT
Backbone=Baichuan2-7B
2026.02
28.59
40.62
7.89
1.67
20.13
19.78
SLEB
Backbone=Baichuan2-7B
2026.02
25.42
22.36
0.89
0.15
18.47
13.46
Feedback
Search any
task
Search any
task