Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 25 (Correction Uplift %)
Loading...
27.59
Correction Uplift
ROSA
-1.1036
6.3457
13.795
21.2443
Sep 27, 2025
Correction Uplift
Updated 1mo ago
Evaluation Results
Method
Method
Links
Correction Uplift
ROSA
Model=Qwen3-8B
2025.09
27.59
ROSA
Model=Qwen2.5-7B-Instruct
2025.09
20.69
ROSA
Model=Qwen3-0.6B
2025.09
16.67
ROSA
Model=DeepSeek-R1-Dist...
2025.09
13.79
Baseline
Model=Qwen3-8B
2025.09
7.41
ROSA
Model=Qwen2.5-0.5B-Ins...
2025.09
6.67
Baseline
Model=Qwen3-0.6B
2025.09
3.57
Baseline
Model=Qwen2.5-7B-Instruct
2025.09
3.57
Baseline
Model=DeepSeek-R1-Dist...
2025.09
3.57
Baseline
Model=Qwen2.5-0.5B-Ins...
2025.09
0
Feedback
Search any
task
Search any
task