Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Step Execution on AndroidControl High
Loading...
75.3
Step Success Rate
UI-TARS-7B w/ TraceR1
9.988
26.944
43.9
60.856
Mar 17, 2026
Step Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Step Success Rate
UI-TARS-7B w/ TraceR1
Params=7B w/ 8B
2026.03
75.3
UI-TARS-7B w/ GPT-4.1
Params=7B w/ -
2026.03
74.8
UI-TARS-32B
Params=32B
2026.03
74.7
UI-TARS-7B
Params=7B
2026.03
72.5
InfiGUI-R1-3B
Params=3B
2026.03
71.1
UI-TARS-2B
Params=2B
2026.03
68.9
OmniParser-v2.0 w/ GPT-4o
Params=-
2026.03
58.8
GUI-R1-7B
Params=7B
2026.03
51.7
QwenVL2.5-7B
Params=7B
2026.03
47.1
GUI-R1-3B
Params=3B
2026.03
46.5
QwenVL2.5-3B
Params=3B
2026.03
38.9
OS-Atlas-7B
Params=7B
2026.03
29.8
OS-Atlas-4B
Params=4B
2026.03
22.7
GPT-4o
Params=-
2026.03
21.2
Claude-computer-use
Params=-
2026.03
12.5
Feedback
Search any
task
Search any
task