Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Logical Reasoning on Countdown CD4
Loading...
59.4
Avg@16
GPS
-1.024
14.663
30.35
46.037
Feb 2, 2026
Avg@16
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@16
GPS
Backbone=Qwen3-8B, Fin...
2026.02
59.4
DS
Backbone=Qwen3-8B, Fin...
2026.02
58.7
GPS
Backbone=Qwen3-4B, Fin...
2026.02
57.2
DS
Backbone=Qwen3-4B, Fin...
2026.02
56.1
MoPPS
Backbone=Qwen3-8B, Fin...
2026.02
55.5
GRESO
Backbone=Qwen3-8B, Fin...
2026.02
54.1
GRESO
Backbone=Qwen3-4B, Fin...
2026.02
53.8
PCL
Backbone=Qwen3-8B, Fin...
2026.02
53.5
MoPPS
Backbone=Qwen3-4B, Fin...
2026.02
52.9
Uniform
Backbone=Qwen3-8B, Fin...
2026.02
52.5
Uniform
Backbone=Qwen3-4B, Fin...
2026.02
51.1
PCL
Backbone=Qwen3-4B, Fin...
2026.02
51
Qwen3-8B
Backbone=Qwen3-8B, Fin...
2026.02
2.1
Qwen3-4B
Backbone=Qwen3-4B, Fin...
2026.02
1.3
Feedback
Search any
task
Search any
task