Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WorkArena

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web Navigation and AutomationWorkArena Held-out Tasks (test)
Success Rate70
16
Web Navigation and AutomationWorkArena Held-out Goals (test)
Success Rate53.8
16
Enterprise interface task completionWorkArena L1
Task Success Rate79.7
14
Reward ModelingWorkArena
Pairwise Accuracy84.33
13
Web Agent NavigationWorkArena L2 147-task (test)
Success Rate40
10
Web Agent NavigationWorkArena L1 (full)
Success Rate79.4
10
Enterprise interface task completionWorkArena++ L2
Success Rate41.6
9
Web Task AutomationWorkArena L1
Average Reward68
8
Enterprise interface interactionWorkArena L2 full benchmark
Success Rate69.4
3
Enterprise interface interactionWorkArena L2 (test)
Success Rate9.7
2
Showing 10 of 10 rows