Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Grid Navigation on FrozenLake v1.0 (Drift I)
Loading...
7,500
Success Rate
Vanilla + GLOVE
-300
1,725
3,750
5,775
Jan 27, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Vanilla + GLOVE
Backbone=DeepSeek-V3.2
2026.01
7,500
MemoryBank + GLOVE
Backbone=DeepSeek-V3.2
2026.01
7,000
Voyager + GLOVE
Backbone=DeepSeek-V3.2
2026.01
6,500
Generative Agent + GLOVE
Backbone=DeepSeek-V3.2
2026.01
6,500
Voyager
Base LLM=Grok-3, Agent...
2026.01
95
Generative Agent + GLOVE
Base LLM=Grok-3, Agent...
2026.01
85
Vanilla + GLOVE
Base LLM=Grok-3, Agent...
2026.01
80
Vanilla
Base LLM=Grok-3, Agent...
2026.01
75
MemoryBank
Base LLM=Grok-3, Agent...
2026.01
70
Voyager + GLOVE
Base LLM=Grok-3, Agent...
2026.01
70
MemoryBank + GLOVE
Base LLM=Grok-3, Agent...
2026.01
65
No Memory (Plain)
Backbone=DeepSeek-V3.2
2026.01
0
Vanilla
Backbone=DeepSeek-V3.2
2026.01
0
MemoryBank
Backbone=DeepSeek-V3.2
2026.01
0
Voyager
Backbone=DeepSeek-V3.2
2026.01
0
Generative Agent
Backbone=DeepSeek-V3.2
2026.01
0
No Memory (Plain)
Base LLM=Grok-3, Agent...
2026.01
0
Generative Agent
Base LLM=Grok-3, Agent...
2026.01
0
Feedback
Search any
task
Search any
task