Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web Navigation on WebShop unseen (test)
Loading...
87.2
Score
ProxMO
22.616
39.383
56.15
72.917
Feb 22, 2026
Score
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Success Rate
ProxMO
Backbone=Qwen2.5-7B-In...
2026.02
87.2
76.5
GiGPO
Backbone=Qwen2.5-7B-In...
2026.02
85.5
74.8
ProxMO
Backbone=Qwen2.5-1.5B-...
2026.02
85.3
67.1
GiGPO
Backbone=Qwen2.5-1.5B-...
2026.02
81.7
62.3
GRPO
Backbone=Qwen2.5-7B-In...
2026.02
79.2
67.2
GRPO
Backbone=Qwen2.5-1.5B-...
2026.02
73.1
52.2
Reflexion
Backbone=Qwen2.5-1.5B-...
2026.02
58.6
23.5
Reflexion
Backbone=Qwen2.5-7B-In...
2026.02
56.3
30.2
ReAct
Backbone=Qwen2.5-7B-In...
2026.02
47.8
21
Gemini-2.5-Pro
Backbone=Closed-Source...
2026.02
42.5
35.9
ReAct
Backbone=Qwen2.5-1.5B-...
2026.02
42.1
14.3
GPT-4o
Backbone=Closed-Source...
2026.02
31.8
23.7
Base
Backbone=Qwen2.5-1.5B-...
2026.02
25.1
6.3
Base
Backbone=Qwen2.5-7B-In...
2026.02
25.1
8.4
Feedback
Search any
task
Search any
task