Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MiniWoB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web navigationMiniWob++
Accuracy53.26
15
AgentMiniWob++ (held-in)
Performance (%)87.12
14
Web Agent NavigationMiniWoB (full)
Success Rate77.1
10
Web browsing agent security and task completionMiniWob++ (unseen tasks)
Success Rate (ASR)18.9
7
Web automationMiniWob 45 tasks subset (test)
Mean Success Rate86.1
6
Web-based task completionMiniWoB++ With feedback 9 tasks
Success Rate91.11
5
Web automationMiniWob 35 tasks subset (test)
Mean Success Rate67
4
enter-text navigationMiniWoB (test)
Success Rate100
3
Showing 8 of 8 rows