Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AMC-23 (Accuracy, Tokens)
Loading...
78.3
Accuracy
LAPO-I
43.668
52.659
61.65
70.641
Mar 9, 2026
Mar 16, 2026
Mar 23, 2026
Mar 30, 2026
Apr 6, 2026
Apr 13, 2026
Apr 20, 2026
Accuracy
Tokens
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Tokens
LAPO-I
2026.03
78.3
3,765
LAPO-D
2026.03
77.6
3,655
ThinkPrune-4k
2026.03
76.3
3,839
ThinkPrune-I2k
2026.03
74.3
2,913
DeepScaler-1.5B
2026.03
74.2
6,416
HAPO
2026.03
70.3
4,301
AutoThink
2026.03
67.8
3,658
Thinkless
2026.03
65.7
5,276
STRATAGEM
Model=STRATAGEM (Ours)
2026.04
60
-
Qwen3-4B-Base
Model=Qwen3-4B-Base
2026.04
50
-
SPIRAL
Model=SPIRAL
2026.04
45
-
Feedback
Search any
task
Search any
task