| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Multimodal-Mind2Web Cross-Website | HTML-T5-XL | Step Success Rate62.2 | 37 | 1mo ago | |
| AndroidWorld latest (test) | MAI-UI-235B-A22B | Success Rate76.7 | 35 | 3mo ago | |
| OSWorld Verified | VLAA-GUI w/ Gemini 3 Flash | OS Success Rate91.7 | 32 | 1mo ago | |
| Multimodal-Mind2Web Cross-Domain | HTML-T5-XL | Step Success Rate67.1 | 32 | 1mo ago | |
| Multimodal-Mind2Web Cross-Task | HTML-T5-XL | Step Success Rate71.5 | 32 | 1mo ago | |
| Mind2Web Cross-Task | MindAct | Element Accuracy66 | 30 | 2mo ago | |
| AITW | Qwen-GUI | Overall Success Rate67.3 | 27 | 1mo ago | |
| AITW (test) | CogAgent | Install Success Rate78.86 | 27 | 3mo ago | |
| Mind2Web (Cross-Website) | DuSAR | Element Accuracy44.6 | 23 | 3mo ago | |
| GUI-Odyssey | Working + Episodic Memory | AMS68.32 | 22 | 6d ago | |
| MMG2Skill-Bench GUI | MMG2Skill | Success Rate77.25 | 18 | 1d ago | |
| AndroidControl High | UILoop-7B | SR (Success Rate)76.3 | 17 | 1mo ago | |
| Smartphone (test) | MiniCPM-GUI | Type EM76.1 | 14 | 3mo ago | |
| Web-Multi (test) | Qwen-GUI | Type EM73.1 | 14 | 3mo ago | |
| Web-Single (test) | Type EM0.81 | 14 | 3mo ago | ||
| Mind2Web Online (Average) | Ovis2.5S-GRPO | Success Rate64 | 13 | 1mo ago | |
| OmniAct-D | NaviMaster | Goal Rate (GR)82.16 | 12 | 2mo ago | |
| OmniAct-W | NaviMaster | Goal Rate (GR)85.3 | 12 | 2mo ago | |
| GuiAct-W | NaviMaster | Success Rate (GR)91.95 | 12 | 2mo ago | |
| Llamatouch | NaviMaster | Goal Rate82.54 | 12 | 2mo ago | |
| GuiAct-P | NaviMaster | Goal Rate (GR)76.08 | 12 | 2mo ago | |
| AC High | NaviMaster | Goal Rate (GR)78.15 | 12 | 2mo ago | |
| AC Low | NaviMaster | Goal Rate94.4 | 12 | 2mo ago | |
| Online-Mind2Web (Easy) | Ovis2.5S-RF++ | Success Rate78.31 | 12 | 1mo ago | |
| WebVoyager | Ovis2.5S-RLOO | Success Rate (Allrecipes)88.89 | 12 | 15d ago |