Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math on SVAMP (Accuracy and Significance Testing)
Loading...
94.2
Accuracy (%)
Ref. SOTA
45.944
58.472
71
83.528
Mar 26, 2026
Apr 5, 2026
Apr 15, 2026
Apr 25, 2026
May 5, 2026
May 15, 2026
May 26, 2026
Accuracy (%)
Delta (Δ)
p-value
Updated 7d ago
Evaluation Results
Method
Method
Links
Accuracy (%)
Delta (Δ)
p-value
Ref. SOTA
Model Type=Proprietary
2026.03
94.2
-
-
EcoThink
2026.03
92.8
-1.4
0.068
DART
Base Model=Qwen3-0.6B
2026.05
81.37
-
-
Original-SFT
Base Model=Qwen3-0.6B
2026.05
80.9
-
-
GRPO
Base Model=Qwen3-0.6B
2026.05
80
-
-
STaR
Base Model=Qwen3-0.6B
2026.05
79.7
-
-
Base
Base Model=Qwen3-0.6B
2026.05
79.03
-
-
TESSY
Base Model=Qwen3-0.6B
2026.05
79
-
-
DART
Base Model=Qwen2.5-0.5...
2026.05
61.45
-
-
GRPO
Base Model=Qwen2.5-0.5...
2026.05
56.4
-
-
Original-SFT
Base Model=Qwen2.5-0.5...
2026.05
56.17
-
-
STaR
Base Model=Qwen2.5-0.5...
2026.05
55.2
-
-
Base
Base Model=Qwen2.5-0.5...
2026.05
53.95
-
-
TESSY
Base Model=Qwen2.5-0.5...
2026.05
47.8
-
-
Feedback
Search any
task
Search any
task