Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web Navigation on WebShop Drift II
Loading...
95
Success Rate
Vanilla + GLOVE
-3.8
21.85
47.5
73.15
Jan 27, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Vanilla + GLOVE
Base LLM=Grok-3, Agent...
2026.01
95
MemoryBank + GLOVE
Base LLM=Grok-3, Agent...
2026.01
95
Voyager + GLOVE
Base LLM=Grok-3, Agent...
2026.01
95
Vanilla + GLOVE
Backbone=GPT-4o, Augme...
2026.01
95
Voyager + GLOVE
Backbone=GPT-4o, Augme...
2026.01
95
Generative Agent + GLOVE
Backbone=GPT-4o, Augme...
2026.01
95
Generative Agent + GLOVE
Base LLM=Grok-3, Agent...
2026.01
90
MemoryBank + GLOVE
Backbone=GPT-4o, Augme...
2026.01
85
MemoryBank
Base LLM=Grok-3, Agent...
2026.01
30
MemoryBank
Backbone=GPT-4o
2026.01
20
No Memory (Plain)
Base LLM=Grok-3, Agent...
2026.01
15
Vanilla
Base LLM=Grok-3, Agent...
2026.01
0
Voyager
Base LLM=Grok-3, Agent...
2026.01
0
Generative Agent
Base LLM=Grok-3, Agent...
2026.01
0
No Memory (Plain)
Backbone=GPT-4o
2026.01
0
Vanilla
Backbone=GPT-4o
2026.01
0
Voyager
Backbone=GPT-4o
2026.01
0
Generative Agent
Backbone=GPT-4o
2026.01
0
Feedback
Search any
task
Search any
task