Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COUNTDOWN

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningCountdown
Accuracy85
168
PlanningCountdown
Accuracy82
68
Mathematical ReasoningCountdown (test)
Accuracy51.2
36
ReasoningCountdown
Accuracy83.2
32
Symbolic ReasoningCountdown
Accuracy49.61
24
Logical ReasoningCountdown
Accuracy52
16
Arithmetic ReasoningCountdown 512 tokens
Pass@162.1
15
Arithmetic ReasoningCountdown 256 tokens
Pass@171.1
15
PlanningCountdown (held-out)
Pass@187.96
14
Logical ReasoningCountdown CD34
Avg@1678.2
14
Logical ReasoningCountdown CD4
Avg@1659.4
14
Numerical ReasoningCountdown-4
CD498.9
13
ReasoningCOUNTDOWN (test)
Accuracy66.02
13
Mathematical ReasoningCountdown 4,5,6-arg held-out difficulties (test)
Accuracy25.1
10
Mathematical ReasoningCountdown 8B Instruct (test)
Accuracy46.1
9
Mathematical ReasoningCountdown
Accuracy (L=128)39.84
9
Mathematical ReasoningCountdown-34 (held-out)
Accuracy81.26
8
Uncertainty QuantificationCountdown
ROC-AUC (128)0.61
8
Arithmetic ReasoningCountdown 0-shot (test)
Pass@1 (Greedy)71.5
7
Arithmetic ReasoningCountdown
Pass@192
6
Symbolic planningCountdown
Exact-match Accuracy (Ngen=128)40.6
6
ReasoningCountdown
Average Diffusion Steps40.4
6
Logical ReasoningCountdown (test)
Accuracy Pass@174.7
5
Mathematical ReasoningCountdown (CTD) (test)
Accuracy43.8
4
PlanningCountdown
Score (%)15.3
4
Showing 25 of 29 rows