Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (Average Accuracy @32 samples)
Loading...
15.4
Accuracy @32 samples
SPICE
1.152
4.851
8.55
12.249
Mar 22, 2026
Accuracy @32 samples
Updated 24d ago
Evaluation Results
Method
Method
Links
Accuracy @32 samples
SPICE
Backbone=Qwen3-8B-Base
2026.03
15.4
WIST
Backbone=Qwen3-8B-Base
2026.03
14.8
R-Zero
Backbone=Qwen3-8B-Base
2026.03
14
SPICE
Backbone=Qwen3-4B-Base
2026.03
12
Base Model
Backbone=Qwen3-8B-Base
2026.03
11.7
WIST
Backbone=Qwen3-4B-Base
2026.03
11.6
R-Zero
Backbone=Qwen3-4B-Base
2026.03
10.3
Base Model
Backbone=Qwen3-4B-Base
2026.03
9.5
SPICE
Backbone=OctoThinker-8...
2026.03
4.8
R-Zero
Backbone=OctoThinker-8...
2026.03
3.5
WIST
Backbone=OctoThinker-8...
2026.03
3.2
SPICE
Backbone=OctoThinker-3...
2026.03
2.7
Base Model
Backbone=OctoThinker-8...
2026.03
2.4
WIST
Backbone=OctoThinker-3...
2026.03
1.9
R-Zero
Backbone=OctoThinker-3...
2026.03
1.8
Base Model
Backbone=OctoThinker-3...
2026.03
1.7
Feedback
Search any
task
Search any
task