Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on PolyMath
Loading...
20.9
Accuracy
self-cons (ours)
5.3
9.35
13.4
17.45
May 31, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
self-cons (ours)
Model Size=14B
2026.05
20.9
self-cons (ours)
Model Size=7B
2026.05
20.5
S1
Model Size=7B
2026.05
20.2
S1
Model Size=14B
2026.05
19.5
base
Model Size=7B
2026.05
18
base
Model Size=14B
2026.05
17.5
LIDR
Model Size=7B
2026.05
16.2
LIDR
Model Size=14B
2026.05
15.2
self-cons (ours)
Model Size=1.5B
2026.05
8.1
S1
Model Size=1.5B
2026.05
8
base
Model Size=1.5B
2026.05
7.5
LIDR
Model Size=1.5B
2026.05
5.9
Feedback
Search any
task
Search any
task