Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic UI Interaction on GUIRILLA-TASK agentic
Loading...
12.5
Input Success Rate
Qwen 2.5 VL 3B
1.3616
4.2533
7.145
10.0367
Oct 16, 2025
Input Success Rate
Click Success Rate
Overall Success Rate
Updated 25d ago
Evaluation Results
Method
Method
Links
Input Success Rate
Click Success Rate
Overall Success Rate
Qwen 2.5 VL 3B
Model Size=3B, Base Mo...
2025.10
12.5
42.95
40.77
Claude Computer Use
2025.10
8.93
65.59
61.53
OpenAI Computer Use
2025.10
8.04
68.75
64.41
OS-Atlas-Pro-7B
Model Size=7B
2025.10
7.14
62.84
58.85
UI TARS 2B
Model Size=2B
2025.10
7.14
50.24
47.16
CogAgent 9B
Model Size=9B
2025.10
3.57
15.83
14.95
Qwen 2.5 VL 7B
Model Size=7B, Base Mo...
2025.10
2.68
39.16
36.55
UI TARS 1.5 7B
Model Size=7B, Version...
2025.10
1.79
54.65
50.86
Feedback
Search any
task
Search any
task