Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Webshop

Benchmarks

Task NameDataset NameSOTA ResultTrend
E-commerce Navigation and SearchWebShop semantic shift Hidden drift
Score100
63
Web Navigation and ShoppingWebshop
Success Rate82.8
33
Interactive Decision MakingWebShop (test)
Score93.1
28
Interactive web-based shopping tasksWebShop
Score92.2
28
Web NavigationWebShop Source
Success Rate100
27
Web-based Agent InteractionWebShop (test)
Success Rate73
25
Web TaskWebShop
Average Reward69.2
24
Online ShoppingWebshop
LLM Score0.63
22
World ModelingWebshop (test)
Search100
20
Web NavigationWebShop Drift II
Success Rate95
18
Web NavigationWebShop Drift I
Success Rate95
18
Online ShoppingWebShop Source
Score100
18
Web NavigationWebShop Drift II - Semantic Shift
Success Rate95
18
Web NavigationWebShop Drift I - Semantic Shift
Success Rate95
18
E-commerce Navigation and SearchWebShop semantic shift Source
Score1
18
Agent TaskWebShop
Success Rate43.2
17
Agent Behavior AdaptationWebShop (WS) (test)
Loop Ratio36.7
17
Next-state predictionWebShop (WS)
EM Accuracy79.05
16
Web NavigationWebShop unseen (test)
Score87.2
14
Task successWebShop
Real Success Score61
14
Web NavigationWebShop Implicit Hidden Drift
Success Rate (Source)100
14
Web NavigationWebShop Explicit Structural Drift II
Success Rate (Source)95
14
Web navigationWebShop
Average Score71.3
13
Web NavigationWebShop (test)
Score0.8935
12
Sequential Decision MakingWebShop
Score0.88
11
Showing 25 of 45 rows