Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on ZebraLogic (held-out)
Loading...
71.7
Accuracy
CORE
48.404
54.452
60.5
66.548
May 27, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
CORE
training samples=10
2026.05
71.7
GEPA
training samples=100
2026.05
70.7
CORE
training samples=5
2026.05
70
MemRL
training samples=5
2026.05
68.3
CORE
training samples=100
2026.05
66.3
GEPA
training samples=5
2026.05
59.7
MemRL
training samples=100
2026.05
58.7
GEPA
training samples=10
2026.05
57
MemRL
training samples=10
2026.05
54.3
Episodic RAG
training samples=5
2026.05
54
GRPO
training samples=10
2026.05
53.3
GRPO
training samples=5
2026.05
52.3
GRPO
training samples=100
2026.05
52
No Learning
training samples=5
2026.05
50.9
No Learning
training samples=10
2026.05
50.9
No Learning
training samples=100
2026.05
50.9
Episodic RAG
training samples=100
2026.05
49.7
Episodic RAG
training samples=10
2026.05
49.3
Feedback
Search any
task
Search any
task