Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH-500 (Correction Uplift %)
Loading...
52.94
Correction Uplift (%)
ROSA
4.5904
17.1427
29.695
42.2473
Sep 27, 2025
Correction Uplift (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Correction Uplift (%)
ROSA
Model=Qwen3-8B
2025.09
52.94
ROSA
Model=Qwen3-0.6B
2025.09
51.31
ROSA
Model=Qwen2.5-7B-Instruct
2025.09
36.91
Baseline
Model=Qwen3-8B
2025.09
24.54
ROSA
Model=Qwen2.5-0.5B-Ins...
2025.09
24.05
Baseline
Model=Qwen3-0.6B
2025.09
17.78
ROSA
Model=DeepSeek-R1-Dist...
2025.09
17.41
Baseline
Model=Qwen2.5-7B-Instruct
2025.09
13.65
Baseline
Model=Qwen2.5-0.5B-Ins...
2025.09
6.79
Baseline
Model=DeepSeek-R1-Dist...
2025.09
6.45
Feedback
Search any
task
Search any
task