Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Navigation on MiniWoB (full)
Loading...
77.1
Success Rate
Gemini 3.1 Pro
55.78
61.315
66.85
72.385
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini 3.1 Pro
Agent harness=This work
2026.04
77.1
Qwen3.5-27B
Params=27B, Agent harn...
2026.04
70.9
Claude 3.5 Sonnet
Agent harness=BrowserG...
2026.04
69.8
A3-Qwen3.5-9B
Params=9B, Agent harne...
2026.04
69
o1-mini
Agent harness=BrowserG...
2026.04
67.8
Llama 3.1 405B
Params=405B, Agent har...
2026.04
64.6
GPT-4o
Agent harness=BrowserG...
2026.04
63.8
Qwen3.5-9B (base)
Params=9B, Agent harne...
2026.04
63.2
Llama 3.1 70B
Params=70B, Agent harn...
2026.04
57.6
GPT-4o-mini
Agent harness=BrowserG...
2026.04
56.6
Feedback
Search any
task
Search any
task