Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WorkArena

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web AgentWorkArena L2
Success Rate4.7
18
Web AgentWorkArena L1
Success Rate38.8
18
Web Navigation and AutomationWorkArena Held-out Tasks (test)
Success Rate70
16
Web Navigation and AutomationWorkArena Held-out Goals (test)
Success Rate53.8
16
Enterprise interface task completionWorkArena L1
Task Success Rate79.7
14
Reward ModelingWorkArena
Pairwise Accuracy84.33
13
HTML observation reductionWorkArena
Average Wall-Clock Time (seconds)0.01
11
Web Agent NavigationWorkArena L2 147-task (test)
Success Rate40
10
Web Agent NavigationWorkArena L1 (full)
Success Rate79.4
10
Enterprise interface task completionWorkArena++ L2
Success Rate41.6
9
Web Task AutomationWorkArena L1
Average Reward68
8
Enterprise Workflow AutomationWorkArena (test)
M&D Score45.1
7
Web NavigationWorkArena L2
Success Rate6.8
5
Web NavigationWorkArena L1
Success Rate7.6
5
Web agent interactionWorkArena L1
Cumulative Runtime (h/m)0.8033
3
Enterprise interface interactionWorkArena L2 full benchmark
Success Rate69.4
3
Enterprise interface interactionWorkArena L2 (test)
Success Rate9.7
2
Showing 17 of 17 rows