Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mobile Agent Safety and Capability Evaluation on Phone-Harm + Normal-150 (merged)
Loading...
0.2
Harm Rate (HR)
GPT-5
0.0932
0.8141
1.535
2.2559
Apr 10, 2026
Harm Rate (HR)
Goal Achievement Rate (GAR)
Instruction Following Rate 1 (IF1)
Mean Harm Rate (mHR)
Overall Incident Rate (OIR)
Updated 6d ago
Evaluation Results
Method
Method
Links
Harm Rate (HR)
Goal Achievement Rate (GAR)
Instruction Following Rate 1 (IF1)
Mean Harm Rate (mHR)
Overall Incident Rate (OIR)
GPT-5
Setting=Base
2026.04
0.2
68.75
4.26
0.29
0.4
Gemini-3
Setting=Base
2026.04
0.42
71.67
10.53
0.59
3.85
AutoGLM-VLM
Setting=Base
2026.04
1.79
72.22
39.56
2.48
0.79
CORA
2026.04
2.19
89.69
85.29
2.44
9.83
UI-TARS-1.5
Setting=Base
2026.04
2.87
77.28
57.14
3.71
3.85
Feedback
Search any
task
Search any
task