Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Arithmetic Reasoning on Countdown 0-shot (test)
Loading...
71.5
Pass@1 (Greedy)
SPG w/ EUBO
14.612
29.381
44.15
58.919
Oct 10, 2025
Pass@1 (Greedy)
Pass@1
Pass@2
Pass@3
Pass@4
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1 (Greedy)
Pass@1
Pass@2
Pass@3
Pass@4
SPG w/ EUBO
Mode=0-shot, Temperatu...
2025.10
71.5
68.2
71.9
73.9
76.6
SPG w/ mixture
Mode=0-shot, Temperatu...
2025.10
71.1
67.5
72.5
75.1
76.6
WD1
Mode=0-shot, Temperatu...
2025.10
54.7
44.3
60.6
68
73.1
UniGRPO
Mode=0-shot, Temperatu...
2025.10
44.9
36.8
55.2
65
72.3
D1
Mode=0-shot, Temperatu...
2025.10
32.4
24.5
40.4
51.4
60.6
LLaDA-1.5
Mode=0-shot, Temperatu...
2025.10
21.1
18.2
32.1
42.5
50
LLaDA-8B-Instruct
Mode=0-shot, Temperatu...
2025.10
16.8
15.8
28.1
37.7
45.3
Feedback
Search any
task
Search any
task