Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME25 (Pass@1 (Avg@32))
Loading...
28.3
Pass@1 (Avg@32)
Mix-RL
17.692
20.446
23.2
25.954
Feb 3, 2026
Pass@1 (Avg@32)
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1 (Avg@32)
Mix-RL
Model Backbone=Qwen2.5...
2026.02
28.3
Mix-RL
Model Backbone=Qwen3-4...
2026.02
28
Unfold-RL
Model Backbone=Qwen3-4...
2026.02
27.8
Fold-RL
Model Backbone=Qwen3-4...
2026.02
27.8
Fold-RL
Model Backbone=Qwen2.5...
2026.02
26.9
Unfold-RL
Model Backbone=Qwen2.5...
2026.02
26.7
Cold-Start
Model Backbone=Qwen3-4...
2026.02
25.4
Unfold-RL
Model Backbone=Qwen2.5...
2026.02
25.1
Unfold-RL
Model Backbone=Qwen3-4...
2026.02
25
Cold-Start
Model Backbone=Qwen2.5...
2026.02
24.6
Cold-Start
Model Backbone=Qwen2.5...
2026.02
23.1
Zero-RL
Model Backbone=Qwen3-4...
2026.02
22.5
Cold-Start
Model Backbone=Qwen3-4...
2026.02
22
Zero-RL
Model Backbone=Qwen2.5...
2026.02
18.1
Feedback
Search any
task
Search any
task