Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME'25 (Accuracy Avg@32)
Loading...
59.69
Accuracy (Avg@32)
PPPO
16.0308
27.3654
38.7
50.0346
Dec 17, 2025
Accuracy (Avg@32)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy (Avg@32)
PPPO
Backbone=Qwen3-8B, Eva...
2025.12
59.69
PPPO
Backbone=Qwen3-4B, Eva...
2025.12
53.44
DAPO-FT
Backbone=Qwen3-8B, Eva...
2025.12
49.38
DAPO
Backbone=Qwen3-8B, Eva...
2025.12
48.75
GRPO
Backbone=Qwen3-8B, Eva...
2025.12
42.29
DAPO
Backbone=Qwen3-4B, Eva...
2025.12
42.08
DAPO-FT
Backbone=Qwen3-4B, Eva...
2025.12
42.08
INTUITOR
Backbone=Qwen3-8B, Eva...
2025.12
40.83
Qwen3-8B
Backbone=Qwen3-8B, Eva...
2025.12
38.75
GRPO
Backbone=Qwen3-4B, Eva...
2025.12
37.71
Qwen3-4B
Backbone=Qwen3-4B, Eva...
2025.12
35.42
INTUITOR
Backbone=Qwen3-4B, Eva...
2025.12
35.42
PPPO
Backbone=Qwen3-1.7B, E...
2025.12
28.96
DAPO-FT
Backbone=Qwen3-1.7B, E...
2025.12
23.96
DAPO
Backbone=Qwen3-1.7B, E...
2025.12
23.33
GRPO
Backbone=Qwen3-1.7B, E...
2025.12
20
Qwen3-1.7B
Backbone=Qwen3-1.7B, E...
2025.12
18.33
INTUITOR
Backbone=Qwen3-1.7B, E...
2025.12
17.71
Feedback
Search any
task
Search any
task