Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Webshop

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web Navigation and ShoppingWebshop
Score100
153
Interactive Decision MakingWebShop
Success Rate84.02
70
E-commerce Navigation and SearchWebShop semantic shift Hidden drift
Score100
63
Online ShoppingWebshop
Score91.5
61
Interactive web-based shopping tasksWebShop
Score92.2
60
Online ShoppingWebShop (test)
Score95
59
Web Shopping AgentWebShop
Score91.1
53
Agent TaskWebShop
Success Rate99
50
Agentic reasoningWebShop
Success Rate65.58
45
Web-based Agent InteractionWebShop (test)
Success Rate73
42
Web-based Agent InteractionWebShop
CoT Match Rate100
41
Interactive Decision MakingWebShop (Seen)
Average Reward62.3
40
Interactive Decision MakingWebShop (test)
Success Rate97
37
Web navigationWebShop
Success Rate76
32
Web-based Agent InteractionWebShop (val)
Success Rate84.4
31
Mean RewardWebShop
Mean Reward63.7
30
Online shopping agent navigationWebShop 128 (val)
Score89.4
30
Web NavigationWebShop Source
Success Rate100
27
Interactive Decision-makingWebShop
Real39
24
Prompt-level Targeted Bit-flip AttackWebShop
CDA100
24
Internal-trigger targeted bit-flip attackWebShop (test)
CDA0.95
24
Web TaskWebShop
Average Reward69.2
24
Interactive Environment Task CompletionWebShop (Seen)
Average Reward86.2
22
World ModelingWebshop (test)
Search100
20
E-commerce product search and purchaseWebShop
Strict Success81.3
19
Showing 25 of 83 rows