| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Web Navigation and Shopping | Webshop | Success Rate82.8 | 81 | |
| E-commerce Navigation and Search | WebShop semantic shift Hidden drift | Score100 | 63 | |
| Interactive web-based shopping tasks | WebShop | Score92.2 | 60 | |
| Web-based Agent Interaction | WebShop (test) | Success Rate73 | 42 | |
| Web-based Agent Interaction | WebShop | CoT Match Rate100 | 41 | |
| Interactive Decision Making | WebShop | Success Rate84.02 | 36 | |
| Web-based Agent Interaction | WebShop (val) | Success Rate84.4 | 31 | |
| Agent Task | WebShop | Success Rate99 | 30 | |
| Interactive Decision Making | WebShop (test) | Score93.1 | 28 | |
| Web Navigation | WebShop Source | Success Rate100 | 27 | |
| Interactive Decision-making | WebShop | Real39 | 24 | |
| Prompt-level Targeted Bit-flip Attack | WebShop | CDA100 | 24 | |
| Internal-trigger targeted bit-flip attack | WebShop (test) | CDA0.95 | 24 | |
| Web Task | WebShop | Average Reward69.2 | 24 | |
| Online Shopping | Webshop | LLM Score0.63 | 22 | |
| World Modeling | Webshop (test) | Search100 | 20 | |
| Web-based Reasoning | WebShop | Average Reasoning Length (tokens)34.8 | 18 | |
| Web Navigation | WebShop Drift II | Success Rate95 | 18 | |
| Web Navigation | WebShop Drift I | Success Rate95 | 18 | |
| Online Shopping | WebShop Source | Score100 | 18 | |
| Web Navigation | WebShop Drift II - Semantic Shift | Success Rate95 | 18 | |
| Web Navigation | WebShop Drift I - Semantic Shift | Success Rate95 | 18 | |
| E-commerce Navigation and Search | WebShop semantic shift Source | Score1 | 18 | |
| Agent Behavior Adaptation | WebShop (WS) (test) | Loop Ratio36.7 | 17 | |
| Next-state prediction | WebShop (WS) | EM Accuracy79.05 | 16 |