Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on AIME 24 (average score)
Loading...
75.16
AIME 24 Average Score
REINFORCE++ (Ours)
38.2192
47.8096
57.4
66.9904
Dec 1, 2025
AIME 24 Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
AIME 24 Average Score
REINFORCE++ (Ours)
Base Model=Qwen3-8B, T...
2025.12
75.16
Qwen3-8B + SFT (STAR-1)
Base Model=Qwen3-8B, T...
2025.12
74.69
Qwen3-8B (thinking)
Base Model=Qwen3-8B, T...
2025.12
74.22
Qwen3-8B + CPO
Base Model=Qwen3-8B, T...
2025.12
68.33
Qwen3-8B + SFT (SafeChain)
Base Model=Qwen3-8B, T...
2025.12
67.6
Qwen3-8B + SFT (R2D-R1)
Base Model=Qwen3-8B, T...
2025.12
60.05
REINFORCE++ (Ours)
Base Model=DeepSeek-R1...
2025.12
49.53
DeepSeek-R1-Distill-Qwen-7B + SFT (STAR-1)
Base Model=DeepSeek-R1...
2025.12
46.88
DeepSeek-R1-Distill-Qwen-7B
Base Model=DeepSeek-R1...
2025.12
46.3
DeepSeek-R1-Distill-Qwen-7B + SFT (SafeChain)
Base Model=DeepSeek-R1...
2025.12
42.6
DeepSeek-R1-Distill-Qwen-7B + CPO
Base Model=DeepSeek-R1...
2025.12
41.67
DeepSeek-R1-Distill-Qwen-7B + SFT (R2D-R1)
Base Model=DeepSeek-R1...
2025.12
39.64
Feedback
Search any
task
Search any
task