Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning and Coding on Polaris
Loading...
48.79
Peak Accuracy@8
balanced-agg
33.4604
37.4402
41.42
45.3998
Apr 14, 2026
Peak Accuracy@8
Peak Best Score@8
Last-Step Accuracy@8
Last-Step Best Score@8
Updated 27d ago
Evaluation Results
Method
Method
Links
Peak Accuracy@8
Peak Best Score@8
Last-Step Accuracy@8
Last-Step Best Score@8
balanced-agg
Model=Qwen3-1.7B, Aggr...
2026.04
48.79
60.94
46.4
55.5
token-agg
Model=Qwen3-1.7B, Aggr...
2026.04
48.42
60.53
43.49
55.57
seq-agg
Model=Qwen3-1.7B, Aggr...
2026.04
48.12
59.5
46.14
57.3
token-agg
Model=Qwen2.5-Math-7B,...
2026.04
35.39
47.68
32.92
45.27
balanced-agg
Model=Qwen2.5-Math-7B,...
2026.04
34.23
47.5
33.19
44.43
seq-agg
Model=Qwen2.5-Math-7B,...
2026.04
34.05
46.97
31.72
43.6
Feedback
Search any
task
Search any
task