Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Operating System Agent Control on WindowsAgentArena
Loading...
21.7
Success Rate
UltraCUA-7B
-0.53832
5.23509
11.0085
16.78191
Oct 20, 2025
Nov 3, 2025
Nov 17, 2025
Dec 1, 2025
Dec 15, 2025
Dec 29, 2025
Jan 12, 2026
Success Rate
Updated 7d ago
Evaluation Results
Method
Method
Links
Success Rate
UltraCUA-7B
2025.10
21.7
UI-TARS-1.5-7B
2025.10
18.1
Qwen2-VL-7B
Training Data=OpenCUA...
2025.10
13.5
OS-SYMPHONY
Backbone=GPT-5, Step=50
2026.01
0.635
OS-SYMPHONY
Backbone=GPT-5-Mini, S...
2026.01
0.622
Agent S3
Backbone=GPT-5, Step=100
2026.01
0.566
Agent S3
Backbone=GPT-5, Step=50
2026.01
0.541
UI-TARS-2
Step=50
2026.01
0.506
OS-SYMPHONY
Backbone=Qwen3-VL-32B-...
2026.01
0.453
UI-TARS-1.5-7B
Step=50
2026.01
0.421
Qwen3-VL-32B-Instruct
Step=50
2026.01
0.317
Feedback
Search any
task
Search any
task