Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
GUI Reasoning on OmniAct-Web
Loading...
87.24
Type Success Rate
CAPO
45.12
56.055
66.99
77.925
Dec 2, 2025
Type Success Rate
Goal Success Rate
Specific Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Type Success Rate
Goal Success Rate
Specific Success Rate
CAPO
Prompting=zero-shot
2025.12
87.24
74.02
74.16
GRPO
Prompting=zero-shot
2025.12
79.02
71.1
70.76
QwenVL2.5-3B
Prompting=zero-shot
2025.12
50.63
46.89
47.02
Os-Atlas-4B
Prompting=zero-shot
2025.12
46.74
49.24
22.99
Feedback
Search any
task
Search any
task