Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web Shopping on WebShop Semantic Explicit structural drift (Drift I)
Loading...
95
Success Rate
GLOVE
-3.8
21.85
47.5
73.15
Jan 27, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
GLOVE
Backbone=GPT-4o, Agent...
2026.01
95
GLOVE
Backbone=GPT-4o, Agent...
2026.01
90
GLOVE
Backbone=GPT-4o, Agent...
2026.01
90
GLOVE
Backbone=GPT-4o, Agent...
2026.01
85
MemoryBank
Backbone=GPT-4o, Agent...
2026.01
20
No Memory (Plain)
Backbone=GPT-4o, Agent...
2026.01
5
Vanilla
Backbone=GPT-4o, Agent...
2026.01
0
Voyager
Backbone=GPT-4o, Agent...
2026.01
0
Generative Agent
Backbone=GPT-4o, Agent...
2026.01
0
Feedback
Search any
task
Search any
task