Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mobile UI Control on Robustness Benchmark
Loading...
43.3
LR
OS-Atlas-7B
9.708
18.429
27.15
35.871
Apr 7, 2026
LR
RSR
Updated 11d ago
Evaluation Results
Method
Method
Links
LR
RSR
OS-Atlas-7B
Category=Open-source 7...
2026.04
43.3
10.7
Qwen2.5-VL-3B
Category=Open-source 3B
2026.04
30
35
UI-R1-3B
Category=Open-source 3B
2026.04
29.5
29
VeriGUI-3B
Category=Ours
2026.04
24.3
51.1
Qwen2.5-VL-7B
Category=Open-source 7...
2026.04
23.5
34.1
UI-S1-7B
Category=Open-source 7...
2026.04
20.5
43.6
VeriGUI-7B
Category=Ours
2026.04
15.6
52.5
Gemini-3-flash
Category=Closed-source
2026.04
15
37
UI-TARS-7B
Category=Open-source 7...
2026.04
13.4
45.5
GPT-5.1
Category=Closed-source
2026.04
11
21.9
Feedback
Search any
task
Search any
task