Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COUNTDOWN

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningCountdown
Accuracy85
252
PlanningCountdown
Accuracy87.5
89
Mathematical ReasoningCountdown (test)
Accuracy85
84
ReasoningCountdown
Accuracy83.2
49
PlanningCountdown
Accuracy85.6
27
Logical planningCountdown (test)
Accuracy84.8
24
Symbolic ReasoningCountdown
Accuracy49.61
24
Arithmetic ReasoningCountdown
Accuracy33.6
19
Logical ReasoningCountdown
Accuracy52
16
Arithmetic ReasoningCountdown 512 tokens
Pass@162.1
15
Arithmetic ReasoningCountdown 256 tokens
Pass@171.1
15
Planning and ReasoningCountdown
Accuracy81.6
14
PlanningCountdown (held-out)
Pass@187.96
14
Logical ReasoningCountdown CD34
Avg@1678.2
14
Logical ReasoningCountdown CD4
Avg@1659.4
14
Numerical ReasoningCountdown-4
CD498.9
13
Mathematical ReasoningCountdown
Accuracy (L=128)54.3
13
ReasoningCOUNTDOWN (test)
Accuracy66.02
13
CountdownCountdown 3 (test)
Reward95
12
Mathematical ReasoningCountdown 4,5,6-arg held-out difficulties (test)
Accuracy25.1
10
Mathematical ReasoningCountdown 8B Instruct (test)
Accuracy46.1
9
Mathematical ReasoningCountdown-34 (held-out)
Accuracy81.26
8
Uncertainty QuantificationCountdown
ROC-AUC (128)0.61
8
Arithmetic ReasoningCountdown 0-shot (test)
Pass@1 (Greedy)71.5
7
Mathematical ReasoningCountdown
CD4 Score49.8
6
Showing 25 of 35 rows