Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (Accuracy (%))
Loading...
40.8
Accuracy (%)
SPIRAL
-1.112
9.769
20.65
31.531
Jun 30, 2025
Accuracy (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (%)
SPIRAL
Backbone=DeepSeek-Dist...
2025.06
40.8
DeepSeek-Distill-Qwen-7B
Backbone=DeepSeek-Dist...
2025.06
39.5
SFT
Backbone=DeepSeek-Dist...
2025.06
36.6
SPIRAL
Backbone=Qwen3-8B, Tra...
2025.06
16.8
SPIRAL
Backbone=Qwen3-4B, Tra...
2025.06
15.6
SFT
Backbone=Qwen3-8B, Tra...
2025.06
15.6
SPIRAL
Backbone=Qwen3-4B, Tra...
2025.06
13.3
SFT
Backbone=Qwen3-4B, Tra...
2025.06
11.7
Qwen3-8B-Base
Backbone=Qwen3-8B
2025.06
11.2
SFT
Backbone=Qwen3-4B, Tra...
2025.06
10.4
Qwen3-4B-Base
Backbone=Qwen3-4B
2025.06
6.2
SPIRAL
Backbone=Octothinker-8...
2025.06
4.8
SFT
Backbone=Octothinker-8...
2025.06
3.8
SPIRAL
Backbone=Llama-3.1-8B-...
2025.06
1.8
Llama-3.1-8B-Instruct
Backbone=Llama-3.1-8B-...
2025.06
0.7
SFT
Backbone=Llama-3.1-8B-...
2025.06
0.7
Octothinker-8B-Base
Backbone=Octothinker-8B
2025.06
0.5
Feedback
Search any
task
Search any
task