Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-task Complex Understanding on MMLUPro STEM
Loading...
71.9
Accuracy
QwQ-32B-Preview*
43.612
50.956
58.3
65.644
Feb 18, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
QwQ-32B-Preview*
Model Series=QwQ, Size...
2025.02
71.9
Qwen2.5-Math-72B-Instruct
Model Series=Qwen2.5-M...
2025.02
66
Llama-3.1-70B-Instruct*
Model Series=Llama-3.1...
2025.02
61.7
OpenMath2-Llama3.1-70B*
Model Series=OpenMath2...
2025.02
55
Eurus-2-7B-PRIME
Model Series=Eurus-2,...
2025.02
53.7
Qwen2.5-Math-7B-S2R-ORL
Model Series=Qwen2.5-M...
2025.02
50
Qwen2.5-Math-7B-S2R-BI
Model Series=Qwen2.5-M...
2025.02
49.8
Qwen2.5-Math-7B
Model Series=Qwen2.5-M...
2025.02
46
Qwen2.5-Math-7B-Instruct
Model Series=Qwen2.5-M...
2025.02
44.7
Feedback
Search any
task
Search any
task