Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (Mean@64 accuracy)
Loading...
53.6
Mean@64 Accuracy
Agentic Proposing
44.24
46.67
49.1
51.53
Feb 3, 2026
Mean@64 Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Mean@64 Accuracy
Agentic Proposing
training budget=10,000...
2026.02
53.6
Polaris
training budget=10,000...
2026.02
51.7
Deepmath
training budget=10,000...
2026.02
51.2
PromptCot 2.0
training budget=10,000...
2026.02
50.9
MathSmith
training budget=10,000...
2026.02
50.3
Socratic-zero
training budget=10,000...
2026.02
50.2
PromptCoT
training budget=10,000...
2026.02
50.1
Qwen3-4B-Instruct-2507
mode=zero-shot
2026.02
49.8
OpenthoughtsS3
training budget=10,000...
2026.02
49.6
DeepSeek-V3.2-Spe
training budget=10,000...
2026.02
49.5
OpenR1math
training budget=10,000...
2026.02
49.4
Claude4.5-Opus
training budget=10,000...
2026.02
48.3
GPT-5.2-High
training budget=10,000...
2026.02
47.7
NuminaMath
training budget=10,000...
2026.02
47.3
Qwen3-Max
training budget=10,000...
2026.02
46.2
R-zero
training budget=10,000...
2026.02
45.8
Gemini-3-Pro
training budget=10,000...
2026.02
45.5
Wizardmath
training budget=10,000...
2026.02
44.8
Metamath
training budget=10,000...
2026.02
44.6
Feedback
Search any
task
Search any
task