Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (Mean@64 accuracy)
Loading...
51.2
Mean@64 Acc
Agentic Proposing
42.776
44.963
47.15
49.337
Feb 3, 2026
Mean@64 Acc
Updated 3d ago
Evaluation Results
Method
Method
Links
Mean@64 Acc
Agentic Proposing
training budget=10,000...
2026.02
51.2
PromptCot 2.0
training budget=10,000...
2026.02
48.5
Deepmath
training budget=10,000...
2026.02
48.2
Polaris
training budget=10,000...
2026.02
47.4
PromptCoT
training budget=10,000...
2026.02
47.3
MathSmith
training budget=10,000...
2026.02
47.1
Socratic-zero
training budget=10,000...
2026.02
46.9
Qwen3-4B-Instruct-2507
mode=zero-shot
2026.02
46.7
OpenthoughtsS3
training budget=10,000...
2026.02
46.1
OpenR1math
training budget=10,000...
2026.02
45.8
GPT-5.2-High
training budget=10,000...
2026.02
45.8
Claude4.5-Opus
training budget=10,000...
2026.02
45.7
DeepSeek-V3.2-Spe
training budget=10,000...
2026.02
45.6
R-zero
training budget=10,000...
2026.02
44.3
Wizardmath
training budget=10,000...
2026.02
44
NuminaMath
training budget=10,000...
2026.02
43.9
Metamath
training budget=10,000...
2026.02
43.5
Qwen3-Max
training budget=10,000...
2026.02
43.4
Gemini-3-Pro
training budget=10,000...
2026.02
43.1
Feedback
Search any
task
Search any
task