Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Classic Control on MountainCar Source
Loading...
100
Success Rate
No Memory (Plain)
84.4
88.45
92.5
96.55
Jan 27, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
No Memory (Plain)
Base LLM=Grok-3, Agent...
2026.01
100
Vanilla
Base LLM=Grok-3, Agent...
2026.01
100
Vanilla + GLOVE
Base LLM=Grok-3, Agent...
2026.01
100
MemoryBank
Base LLM=Grok-3, Agent...
2026.01
100
MemoryBank + GLOVE
Base LLM=Grok-3, Agent...
2026.01
100
Voyager
Base LLM=Grok-3, Agent...
2026.01
100
Voyager + GLOVE
Base LLM=Grok-3, Agent...
2026.01
100
Generative Agent
Base LLM=Grok-3, Agent...
2026.01
100
Generative Agent + GLOVE
Base LLM=Grok-3, Agent...
2026.01
100
Vanilla
Backbone=GPT-4o
2026.01
100
Vanilla + GLOVE
Backbone=GPT-4o, Augme...
2026.01
100
Voyager
Backbone=GPT-4o
2026.01
100
Voyager + GLOVE
Backbone=GPT-4o, Augme...
2026.01
100
Generative Agent
Backbone=GPT-4o
2026.01
100
MemoryBank
Backbone=GPT-4o
2026.01
95
MemoryBank + GLOVE
Backbone=GPT-4o, Augme...
2026.01
95
Generative Agent + GLOVE
Backbone=GPT-4o, Augme...
2026.01
95
No Memory (Plain)
Backbone=GPT-4o
2026.01
85
Feedback
Search any
task
Search any
task