Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on AIME 25 (average score)
Loading...
44.11
Average Score
REINFORCE++ (Ours)
27.21
31.5975
35.985
40.3725
Dec 1, 2025
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
REINFORCE++ (Ours)
Base Model=Qwen3-8B, T...
2025.12
44.11
Qwen3-8B + SFT (STAR-1)
Base Model=Qwen3-8B, T...
2025.12
42.55
Qwen3-8B + CPO
Base Model=Qwen3-8B, T...
2025.12
42.29
Qwen3-8B (thinking)
Base Model=Qwen3-8B, T...
2025.12
40.57
Qwen3-8B + SFT (SafeChain)
Base Model=Qwen3-8B, T...
2025.12
39.06
Qwen3-8B + SFT (R2D-R1)
Base Model=Qwen3-8B, T...
2025.12
36.04
REINFORCE++ (Ours)
Base Model=DeepSeek-R1...
2025.12
32.14
DeepSeek-R1-Distill-Qwen-7B + SFT (STAR-1)
Base Model=DeepSeek-R1...
2025.12
31.87
DeepSeek-R1-Distill-Qwen-7B
Base Model=DeepSeek-R1...
2025.12
30.52
DeepSeek-R1-Distill-Qwen-7B + SFT (R2D-R1)
Base Model=DeepSeek-R1...
2025.12
29.38
DeepSeek-R1-Distill-Qwen-7B + SFT (SafeChain)
Base Model=DeepSeek-R1...
2025.12
28.64
DeepSeek-R1-Distill-Qwen-7B + CPO
Base Model=DeepSeek-R1...
2025.12
27.86
Feedback
Search any
task
Search any
task