Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Logical Reasoning on Countdown CD34
Loading...
78.2
Avg@16
DS
0.512
20.681
40.85
61.019
Feb 2, 2026
Avg@16
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@16
DS
Backbone=Qwen3-8B, Fin...
2026.02
78.2
GPS
Backbone=Qwen3-8B, Fin...
2026.02
77.9
DS
Backbone=Qwen3-4B, Fin...
2026.02
76.3
GPS
Backbone=Qwen3-4B, Fin...
2026.02
76
MoPPS
Backbone=Qwen3-8B, Fin...
2026.02
76
GRESO
Backbone=Qwen3-8B, Fin...
2026.02
75.1
PCL
Backbone=Qwen3-8B, Fin...
2026.02
74.9
MoPPS
Backbone=Qwen3-4B, Fin...
2026.02
73.9
Uniform
Backbone=Qwen3-4B, Fin...
2026.02
73.8
GRESO
Backbone=Qwen3-4B, Fin...
2026.02
73.8
Uniform
Backbone=Qwen3-8B, Fin...
2026.02
73.3
PCL
Backbone=Qwen3-4B, Fin...
2026.02
72.8
Qwen3-8B
Backbone=Qwen3-8B, Fin...
2026.02
3.9
Qwen3-4B
Backbone=Qwen3-4B, Fin...
2026.02
3.5
Feedback
Search any
task
Search any
task