Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web Navigation on WebShop Drift I - Semantic Shift
Loading...
95
Success Rate
Vanilla + GLOVE
-3.8
21.85
47.5
73.15
Jan 27, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Vanilla + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
95
MemoryBank + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
95
Voyager + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
95
Generative Agent + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
95
Vanilla + GLOVE
Backbone=Qwen3-30B
2026.01
95
MemoryBank + GLOVE
Backbone=Qwen3-30B
2026.01
95
Voyager + GLOVE
Backbone=Qwen3-30B
2026.01
95
Generative Agent + GLOVE
Backbone=Qwen3-30B
2026.01
95
No Memory (Plain)
LLM Backbone=Llama3.3-70B
2026.01
50
MemoryBank
LLM Backbone=Llama3.3-70B
2026.01
35
MemoryBank
Backbone=Qwen3-30B
2026.01
35
No Memory (Plain)
Backbone=Qwen3-30B
2026.01
30
Vanilla
LLM Backbone=Llama3.3-70B
2026.01
0
Voyager
LLM Backbone=Llama3.3-70B
2026.01
0
Generative Agent
LLM Backbone=Llama3.3-70B
2026.01
0
Vanilla
Backbone=Qwen3-30B
2026.01
0
Voyager
Backbone=Qwen3-30B
2026.01
0
Generative Agent
Backbone=Qwen3-30B
2026.01
0
Feedback
Search any
task
Search any
task