Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Constraint Satisfaction Reasoning on ZebraLogic
Loading...
96.8
Easy Score
GPT-4o
61.128
70.389
79.65
88.911
Dec 2, 2025
Easy Score
Hard Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Easy Score
Hard Score
Average Score
GPT-4o
Prompting=decl. PL-1S
2025.12
96.8
44
58.8
GPT-4o
Prompting=decl. PY-1S
2025.12
96.4
62.1
71.7
Claude 3.5 Sonnet
Prompting=CoT
2025.12
87.5
12.4
33.4
Llama-3.1-405B
Prompting=CoT
2025.12
87.1
11.4
32.6
GPT-4o
Prompting=CoT
2025.12
77.9
8.9
28.2
CodeLlama13B
Prompting=decl. PY-1S
2025.12
74.3
20
35.2
GPT-4o
Prompting=PY-ZS
2025.12
69.7
26.5
38.1
GPT-4o
Prompting=PL-ZS
2025.12
64.6
15
28.9
CodeLlama13B
Prompting=decl. PL-1S
2025.12
62.5
14.6
28.5
Feedback
Search any
task
Search any
task