Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Offline GUI Agent Evaluation on AndroidControl High
Loading...
83.7
Type Accuracy
UI-TARS-7B
64.876
69.763
74.65
79.537
May 14, 2026
Type Accuracy
Step Success Rate
Updated 19d ago
Evaluation Results
Method
Method
Links
Type Accuracy
Step Success Rate
UI-TARS-7B
Model Category=Open-so...
2026.05
83.7
72.5
Mimo-VL-7B + WildGUI
Pre-training=WildGUI
2026.05
80.6
71.4
Mimo-VL-7B
Model Category=Open-so...
2026.05
76.3
65.6
Qwen2.5-VL-7B*
Model Category=Open-so...
2026.05
75.1
62.9
Qwen2.5-VL-7B* + WildGUI
Pre-training=WildGUI
2026.05
74.6
64.5
OS-Atlas-7B
Model Category=Open-so...
2026.05
70.4
56.5
GPT-4o
Model Category=Closed-...
2026.05
66.3
20.8
OS-Genesis-7B
Model Category=Open-so...
2026.05
65.9
44.4
Aguvis-7B
Model Category=Open-so...
2026.05
65.6
54.2
Feedback
Search any
task
Search any
task