| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WebVoyager (test) | GPT-5 (SoM) | Success Rate90.6 | 18 | 19d ago | |
| WebVoyager, Online-Mind2Web, DeepShop (test average) | Gemini computer-use-preview | Average Success Rate69.3 | 17 | 19d ago | |
| DeepShop (test) | Orchard-GUI-4B (SFT + RL) | Success Rate64 | 17 | 19d ago |