Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Enterprise Interface Task Completion on WorkArena L1
Loading...
79.7
Task Success Rate
Gemini 3 Pro
1.18
21.565
41.95
62.335
Apr 9, 2026
Task Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Task Success Rate
Gemini 3 Pro
Category=Proprietary,...
2026.04
79.7
GPT-5
Harness=GenericAgent,...
2026.04
79.1
Claude 4 Sonnet
Harness=GenericAgent,...
2026.04
63.3
Gemini 3.1 Flash L.
Category=Proprietary
2026.04
58.5
Qwen3.5-27B
Category=Open-weight (...
2026.04
57
A3-Qwen3.5-9B
Category=A3 fine-tuned...
2026.04
51.5
A3-Qwen3.5-9B
Harness=GenericAgent,...
2026.04
51.5
GPT-oss-120B
Harness=GenericAgent,...
2026.04
50.9
A3-Qwen3.5-4B
Category=A3 fine-tuned...
2026.04
44.8
Qwen3.5-4B
Category=Open-weight (...
2026.04
33.6
Qwen3.5-9B
Category=Open-weight (...
2026.04
33.3
Qwen3.5-9B (base)
Harness=GenericAgent,...
2026.04
33.3
A3-Qwen3.5-2B
Category=A3 fine-tuned...
2026.04
6.7
Qwen3.5-2B
Category=Open-weight (...
2026.04
4.2
Feedback
Search any
task
Search any
task