Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
GUI Agent Attack Success Rate Evaluation on MIRAGE (1,111-sample main set)
Loading...
41
FB Success Rate
gpt-4o-mini
25.4
29.45
33.5
37.55
May 27, 2026
FB Success Rate
WA Success Rate
Amaz Success Rate
IG Success Rate
Shop Success Rate
Spot Success Rate
Tel Success Rate
Temu Success Rate
TikT Success Rate
X Success Rate
Overall Success Rate
Updated 6d ago
Evaluation Results
Method
Method
Links
FB Success Rate
WA Success Rate
Amaz Success Rate
IG Success Rate
Shop Success Rate
Spot Success Rate
Tel Success Rate
Temu Success Rate
TikT Success Rate
X Success Rate
Overall Success Rate
gpt-4o-mini
Model Type=Closed-weight
2026.05
41
21
46
30
22
26
23
32
23
39
30.2
Qwen3-VL-8B-Instruct
Model Size=8B, Backbon...
2026.05
29
24
47
23
27
25
24
32
26
39
28.9
GLM-4.5V
Backbone=GLM-4.5V
2026.05
28
25
41
24
29
24
30
32
19
40
28.6
Qwen3-VL-32B-Instruct
Model Size=32B, Backbo...
2026.05
27
19
39
22
26
20
22
25
13
25
23
Qwen3-VL-30B-A3B-Instruct
Model Size=30B, Backbo...
2026.05
26
20
41
26
22
23
25
29
16
29
25
Feedback
Search any
task
Search any
task