Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (Average Accuracy @ 32 samples)
Loading...
13.9
Average Accuracy (32 samples)
WIST
-0.14
3.505
7.15
10.795
Mar 22, 2026
Average Accuracy (32 samples)
Updated 24d ago
Evaluation Results
Method
Method
Links
Average Accuracy (32 samples)
WIST
Backbone=Qwen3-8B-Base
2026.03
13.9
SPICE
Backbone=Qwen3-8B-Base
2026.03
13.4
R-Zero
Backbone=Qwen3-8B-Base
2026.03
12.8
Base Model
Backbone=Qwen3-8B-Base
2026.03
11.3
SPICE
Backbone=Qwen3-4B-Base
2026.03
10.9
WIST
Backbone=Qwen3-4B-Base
2026.03
9.7
R-Zero
Backbone=Qwen3-4B-Base
2026.03
7.1
Base Model
Backbone=Qwen3-4B-Base
2026.03
6.4
R-Zero
Backbone=OctoThinker-8...
2026.03
1.5
WIST
Backbone=OctoThinker-8...
2026.03
1.4
Base Model
Backbone=OctoThinker-8...
2026.03
1.1
SPICE
Backbone=OctoThinker-8...
2026.03
0.9
SPICE
Backbone=OctoThinker-3...
2026.03
0.8
Base Model
Backbone=OctoThinker-3...
2026.03
0.6
WIST
Backbone=OctoThinker-3...
2026.03
0.6
R-Zero
Backbone=OctoThinker-3...
2026.03
0.4
Feedback
Search any
task
Search any
task