Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on COUNTDOWN (test)
Loading...
66.02
Accuracy
STP
14.4152
27.8126
41.21
54.6074
Nov 29, 2025
Dec 11, 2025
Dec 23, 2025
Jan 4, 2026
Jan 16, 2026
Jan 28, 2026
Feb 9, 2026
Accuracy
Time (s)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Time (s)
STP
Base model=LLaDA-8B-In...
2026.02
66.02
142,055
GRPO w/ ELBO
Base model=LLaDA-8B-In...
2026.02
36.33
164,657
EDIT
Sequence Length=256, 0...
2025.11
31.6
-
EDIT
Sequence Length=128, 0...
2025.11
28.9
-
EDIT
Sequence Length=512, 0...
2025.11
27.7
-
Diffu-GRPO
Base model=LLaDA-8B-In...
2026.02
25.39
147,100
LLaDA
SFT=true, Sequence Len...
2025.11
20.7
-
LLaDA
SFT=true, Sequence Len...
2025.11
20.3
-
LLaDA
SFT=false, Sequence Le...
2025.11
19.9
-
LLaDA
SFT=false, Sequence Le...
2025.11
19.5
-
LLaDA
SFT=true, Sequence Len...
2025.11
19.5
-
LLaDA-8B-Instruct
Base model=LLaDA-8B-In...
2026.02
16.8
-
LLaDA
SFT=false, Sequence Le...
2025.11
16.4
-
Feedback
Search any
task
Search any
task