Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
STEM Knowledge and Reasoning on MMLU STEM
Loading...
82.3
Pass@1
Qwen2.5-Math-72B-Instruct
63.476
68.363
73.25
78.137
Sep 18, 2024
Pass@1
Majority@8
RM@8
Average Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
Pass@1
Majority@8
RM@8
Average Score
Qwen2.5-Math-72B-Instruct
Reasoning Mode=Tool-In...
2024.09
82.3
85
90
-
Qwen2.5-Math-72B-Instruct
Reasoning Mode=Chain-o...
2024.09
80.8
84.9
80.1
-
GPT-4o-2024-08-06
Reasoning Mode=Chain-o...
2024.09
64.2
-
-
-
Feedback
Search any
task
Search any
task