Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mobile Agent Safety and Capability Evaluation on Phone-Harm + Normal-150 (merged)

0.2Harm Rate (HR)

GPT-5

0.09320.81411.5352.2559Apr 10, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.04
0.268.754.260.290.4
2026.04
0.4271.6710.530.593.85
2026.04
1.7972.2239.562.480.79
2026.04
2.1989.6985.292.449.83
2026.04
2.8777.2857.143.713.85