Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Language Understanding on MMLU, AlpacaEval, Arena-Hard
Loading...
73.41
MMLU Accuracy
Qwen2.5-7B + DataFlow-Chat-15K
71.3716
71.9008
72.43
72.9592
Dec 18, 2025
MMLU Accuracy
AlpacaEval Score
Arena-Hard Win Rate
Average Score (Aggregate)
Updated 4d ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
AlpacaEval Score
Arena-Hard Win Rate
Average Score (Aggregate)
Qwen2.5-7B + DataFlow-Chat-15K
Base Model=Qwen2.5-7B,...
2025.12
73.41
10.11
110
28.21
Qwen2.5-7B + ShareGPT-15K
Base Model=Qwen2.5-7B,...
2025.12
73.09
3.7
130
26.03
Qwen2.5-7B + UltraChat-15K
Base Model=Qwen2.5-7B,...
2025.12
72.97
3.97
80
25.91
Qwen2.5-7B
Base Model=Qwen2.5-7B
2025.12
71.45
7.05
60
26.36
Feedback
Search any
task
Search any
task