Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
GUI Agent Interaction on OSWorld w/o Loop
Loading...
29.5
AUV
GPT-4o + ScaleCUA-7B
7.452
13.176
18.9
24.624
Feb 2, 2026
AUV
Updated 4d ago
Evaluation Results
Method
Method
Links
AUV
GPT-4o + ScaleCUA-7B
Planner=GPT-4o, GUI gr...
2026.02
29.5
UI-TARS-1.5-7B
2026.02
28
UI-TARS-72B-DPO
2026.02
26.3
Claude3.7-Sonnet
2026.02
9
Qwen2.5-VL-72B-Instruct
2026.02
8.3
Feedback
Search any
task
Search any
task