Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Countdown-Stepwise (test)
Loading...
80.37
Pass@1
REUSERL-SegCost
18.178
34.324
50.47
66.616
May 29, 2026
Pass@1
Updated 2d ago
Evaluation Results
Method
Method
Links
Pass@1
REUSERL-SegCost
Buffer=Global success...
2026.05
80.37
Pure Round-Length
2026.05
77.02
REUSERL-SegCost (no buffer)
Buffer=None
2026.05
68.94
Vanilla GRPO
2026.05
68.46
Vanilla
2026.05
20.57
Feedback
Search any
task
Search any
task