Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web-based Interaction on MiniWoB
Loading...
74.7
Success Rate
Gemini 3 Pro
9.284
26.267
43.25
60.233
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini 3 Pro
Category=Proprietary,...
2026.04
74.7
Gemini 3.1 Flash L.
Category=Proprietary
2026.04
74.1
GPT-5
Harness=GenericAgent,...
2026.04
71.5
Qwen3.5-27B
Category=Open-weight (...
2026.04
70.9
Claude 4 Sonnet
Harness=GenericAgent,...
2026.04
70.7
A3-Qwen3.5-9B
Category=A3 fine-tuned...
2026.04
69
A3-Qwen3.5-9B
Harness=GenericAgent,...
2026.04
69
A3-Qwen3.5-4B
Category=A3 fine-tuned...
2026.04
66.9
GPT-oss-120B
Harness=GenericAgent,...
2026.04
66.4
Qwen3.5-9B
Category=Open-weight (...
2026.04
63.2
Qwen3.5-9B (base)
Harness=GenericAgent,...
2026.04
63.2
Qwen3.5-4B
Category=Open-weight (...
2026.04
61.1
A3-Qwen3.5-2B
Category=A3 fine-tuned...
2026.04
38.6
Qwen3.5-2B
Category=Open-weight (...
2026.04
11.8
Feedback
Search any
task
Search any
task