Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (ACC, Avg Tokens)
Loading...
81.48
ACC (%)
Qwen3-8B
55.6776
62.3763
69.075
75.7737
Aug 5, 2025
ACC (%)
Avg Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
ACC (%)
Avg Tokens
Qwen3-8B
strategy=step-entropy...
2025.08
81.48
11,533.57
Qwen3-8B
strategy=Full COT
2025.08
79.31
20,936.57
QwQ-32B
strategy=Full COT
2025.08
79.13
21,243.93
QwQ-32B
strategy=step-entropy...
2025.08
74.07
9,544.13
DeepSeek-R1-14B
strategy=Full COT
2025.08
65.52
15,414.83
DeepSeek-R1-7B
strategy=Full COT
2025.08
63.33
15,843.43
DeepSeek-R1-14B
strategy=step-entropy...
2025.08
58.62
8,705.57
DeepSeek-R1-7B
strategy=step-entropy...
2025.08
56.67
10,092.8
Feedback
Search any
task
Search any
task