| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Web agent tasks | Mind2Web Cross-Task | Step Success Rate53.2 | 64 | |
| Web Navigation Task Success | MIND2WEB ONLINE (test) | Task Success Rate (Overall)67 | 41 | |
| Web navigation | Mind2Web | Overall Success Rate58.7 | 41 | |
| Web agent tasks | Mind2Web (Cross-Website) | Element Accuracy57.2 | 40 | |
| GUI Web Agent Navigation | Mind2Web Online | Overall Average Score67.3 | 37 | |
| Web Navigation | Mind2Web Cross-Domain | Element Accuracy (EA)65.2 | 37 | |
| Web agent tasks | Mind2Web Cross-Domain | Ele.Acc55.7 | 37 | |
| GUI Navigation | Multimodal-Mind2Web Cross-Domain | Step Success Rate67.1 | 32 | |
| GUI Navigation | Multimodal-Mind2Web Cross-Task | Step Success Rate71.5 | 32 | |
| GUI Navigation | Mind2Web Cross-Task | Element Accuracy66 | 30 | |
| Web Agent Navigation | Mind2Web Cross-Domain 1.0 | Success Rate445 | 26 | |
| Web Agent Navigation | Mind2Web Cross-Task 1.0 | Success Rate49.2 | 26 | |
| GUI Agent Navigation | Mind2Web | Success Rate45.68 | 24 | |
| Adversarial Attack against WebExperT agent | Mind2Web 600 tasks (test) | ASR (Finance, pass@10)46.2 | 24 | |
| Adversarial Attack against SeeAct agent | Mind2Web 600 tasks (test) | ASR Finance (pass@10)54.1 | 24 | |
| GUI Navigation | Mind2Web (Cross-Website) | Element Accuracy44.6 | 23 | |
| Web Navigation | MM-Mind2Web | Step Success Rate (SR)22.97 | 22 | |
| Web Navigation Task Completion | Mind2Web Cross-task | Success Rate64.6 | 18 | |
| Web Agent Navigation | Mind2Web All 1.0 | Element Accuracy0.484 | 16 | |
| Web Action Generation Efficiency | Mind2Web (All) | Time to Proposal Steps363.6 | 16 | |
| Web Action Generation Efficiency | Mind2Web Cross-Domain | To_Pro (Steps/Time)334.9 | 16 | |
| Web Action Generation Efficiency | Mind2Web Cross-Website | To_Pro Steps/Time364.1 | 16 | |
| Web Action Generation Efficiency | Mind2Web Cross-Task | Time to Procedure378.2 | 16 | |
| Web Navigation | Mind2Web Live (test) | Task Completion Rate52.8 | 16 | |
| Element Grounding | Multimodal-Mind2Web Cross-Task | Element Accuracy50.7 | 16 |