Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Planning on Tower of Hanoi (held-out)
Loading...
72.7
Accuracy
MemRL
5.1
22.65
40.2
57.75
May 27, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
MemRL
training samples=100
2026.05
72.7
MemRL
training samples=5
2026.05
51.7
GEPA
training samples=5
2026.05
43.3
CORE
training samples=100
2026.05
42.7
CORE
training samples=10
2026.05
42.3
CORE
training samples=5
2026.05
40
MemRL
training samples=10
2026.05
39.3
GEPA
training samples=100
2026.05
35.3
GEPA
training samples=10
2026.05
31
Episodic RAG
training samples=100
2026.05
30.3
Episodic RAG
training samples=5
2026.05
28.7
Episodic RAG
training samples=10
2026.05
24.3
No Learning
training samples=5
2026.05
17.9
No Learning
training samples=10
2026.05
17.9
No Learning
training samples=100
2026.05
17.9
GRPO
training samples=10
2026.05
12
GRPO
training samples=100
2026.05
10.7
GRPO
training samples=5
2026.05
7.7
Feedback
Search any
task
Search any
task