Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WindowsAgentArena

Benchmarks

Task NameDataset NameSOTA ResultTrend
Windows UI NavigationWindowsAgentArena (WAA)
Success Rate74.5
14
GUI AutomationWindowsAgentArena
Success Rate (Office)54.76
11
GUI Agent ExecutionWindowsAgentArena full-task
Full Task Success Rate0.3076
9
Operating System Agent ControlWindowsAgentArena
Success Rate0.635
8
Windows Computer Use AutomationWindowsAgentArena V2
LibreOffice Success Rate7.1
7
Agent PerformanceWindowsAgentArena (test)
Office Score30.4
6
Native Windows operating system task executionWindowsAgentArena (WAA)
AUV9.5
4
Windows Agent Task CompletionWindowsAgentArena (original)
Office Success Rate2.3
2
Showing 8 of 8 rows