Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
GUI Agent Execution on WindowsAgentArena full-task
Loading...
0.3076
Full Task Success Rate
Qwen3-VL-8B
0.033352
0.104551
0.17575
0.246949
Feb 6, 2026
Full Task Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Full Task Success Rate
Qwen3-VL-8B
Setting=ANCHOR
2026.02
0.3076
Qwen3-VL-8B
Setting=Task-Driven
2026.02
0.2747
Qwen3-VL-8B
Setting=Zero-Shot
2026.02
0.2307
GLM4.1V-9B
Setting=ANCHOR
2026.02
0.163
Qwen2.5-VL-7B
Setting=ANCHOR
2026.02
0.1522
Qwen2.5-VL-7B
Setting=Task-Driven
2026.02
0.141
GLM4.1V-9B
Setting=Task-Driven
2026.02
0.1319
GLM4.1V-9B
Setting=Zero-Shot
2026.02
0.0549
Qwen2.5-VL-7B
Setting=Zero-Shot
2026.02
0.0439
Feedback
Search any
task
Search any
task