Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safe Navigation on AndroidWorld core20 safe general tasks
Loading...
11
Success Count (out of 20)
GPT-5
3.72
5.61
7.5
9.39
Apr 10, 2026
Success Count (out of 20)
Success Rate
Relative Performance vs GPT-5
Updated 6d ago
Evaluation Results
Method
Method
Links
Success Count (out of 20)
Success Rate
Relative Performance vs GPT-5
GPT-5
Setting=Base
2026.04
11
55
100
CORA
Setting=Ours
2026.04
8
40
72.7
AutoGLM
Setting=Prev setting
2026.04
6
30
54.5
UI-TARS-1.5
Setting=Base, avg of 3...
2026.04
4
20
36.4
Feedback
Search any
task
Search any
task