Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME (Solved Rate %)
Loading...
83.33
Solved Rate (%)
Self-Refine
14.95
32.7025
50.455
68.2075
May 21, 2025
Solved Rate (%)
Updated 23d ago
Evaluation Results
Method
Method
Links
Solved Rate (%)
Self-Refine
Model=Qwen3 32B think,...
2025.05
83.33
ICRL
Model=Qwen3 32B think,...
2025.05
80
Reflexion
Model=Qwen3 32B think,...
2025.05
70
Base
Model=Qwen3 32B think,...
2025.05
66.58
ICRL
Model=Qwen3 32B, Conte...
2025.05
46.66
Self-Refine
Model=Qwen3 32B, Conte...
2025.05
43.33
Reflexion
Model=Phi-4, Context W...
2025.05
40
ICRL
Model=Phi-4, Context W...
2025.05
40
ICRL
Model=Llama 4 Maverick...
2025.05
35
Reflexion
Model=Qwen3 32B, Conte...
2025.05
33.33
Self-Refine
Model=Phi-4, Context W...
2025.05
33.33
Reflexion
Model=Llama 4 Maverick...
2025.05
23.33
Base
Model=Qwen3 32B, Conte...
2025.05
22.54
Self-Refine
Model=Llama 4 Maverick...
2025.05
20
Base
Model=Phi-4, Context W...
2025.05
20
Base
Model=Llama 4 Maverick...
2025.05
17.58
Feedback
Search any
task
Search any
task