Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web Navigation on WebShop Drift II - Semantic Shift
Loading...
95
Success Rate
Voyager + GLOVE
-3.8
21.85
47.5
73.15
Jan 27, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Voyager + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
95
Generative Agent + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
90
Voyager + GLOVE
Backbone=Qwen3-30B
2026.01
90
Generative Agent + GLOVE
Backbone=Qwen3-30B
2026.01
90
Vanilla + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
85
MemoryBank + GLOVE
LLM Backbone=Llama3.3-70B
2026.01
85
Vanilla + GLOVE
Backbone=Qwen3-30B
2026.01
85
MemoryBank + GLOVE
Backbone=Qwen3-30B
2026.01
80
MemoryBank
LLM Backbone=Llama3.3-70B
2026.01
30
MemoryBank
Backbone=Qwen3-30B
2026.01
30
No Memory (Plain)
LLM Backbone=Llama3.3-70B
2026.01
0
Vanilla
LLM Backbone=Llama3.3-70B
2026.01
0
Voyager
LLM Backbone=Llama3.3-70B
2026.01
0
Generative Agent
LLM Backbone=Llama3.3-70B
2026.01
0
No Memory (Plain)
Backbone=Qwen3-30B
2026.01
0
Vanilla
Backbone=Qwen3-30B
2026.01
0
Voyager
Backbone=Qwen3-30B
2026.01
0
Generative Agent
Backbone=Qwen3-30B
2026.01
0
Feedback
Search any
task
Search any
task