Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
GUI reasoning on Aggregate (GUI-Act-Web, OmniAct-Web, AndroidControl)
Loading...
74.6
Overall Accuracy
CAPO
48.756
55.4655
62.175
68.8845
Dec 2, 2025
Overall Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Overall Accuracy
CAPO
Prompting=zero-shot
2025.12
74.6
GRPO
Prompting=zero-shot
2025.12
70.79
QwenVL2.5-3B
Prompting=zero-shot
2025.12
54.09
Os-Atlas-4B
Prompting=zero-shot
2025.12
49.75
Feedback
Search any
task
Search any
task