| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| GUI Grounding | OSWorld-G | Average Score72.7 | 144 | |
| GUI Grounding | OSWorld-G (test) | Element Accuracy78.4 | 52 | |
| Computer Use | OSWorld | OS Success Rate75 | 45 | |
| OS GUI Agentic Task Execution | OSWorld 361 tasks (Verified) | OS Success Rate79.17 | 43 | |
| Operating System GUI Agentic Reasoning | OSWorld | Success Rate64.29 | 42 | |
| GUI Automation | OSWorld Verified (test) | Overall Success Rate61.92 | 40 | |
| UI Agent Evaluation | OSWorld | SR (15 Steps)40.3 | 34 | |
| GUI Navigation | OSWorld Verified | OS Success Rate91.7 | 32 | |
| GUI Agent Interaction | OSWorld | Average Accuracy42.5 | 24 | |
| Computer task execution | OSWorld (verified) | Office Task Score64.8 | 24 | |
| Grounding | OSWorld | Overall Score64.7 | 22 | |
| GUI Agent Task Completion | OSWorld 1.0 (test) | Success Rate (GIMP)82.05 | 20 | |
| Grounding | OSworld G-R | Accuracy76.4 | 19 | |
| Interactive Desktop Task Success | OSWorld | Chrome Success Rate59.91 | 18 | |
| GUI Grounding | OSWorld G-Refine v1.0 (test) | Overall Success Rate75 | 17 | |
| GUI Agent Interaction | OSWorld | Success Rate (Max Steps: 15)42.9 | 16 | |
| End-to-End Environment Interaction | OSWorld-Verified (test) | Pass@161.4 | 16 | |
| GUI Agent Task Success | OSWorld | Success Rate24.4 | 16 | |
| Task accuracy | OSWorld | Task Accuracy41.49 | 15 | |
| Multimodal Task Accuracy | OSWorld | Multimodal Task Accuracy41.49 | 15 | |
| Attack Success Rate (ASR) Evaluation | OSWorld (885-sample split) | Eligible Rate98.08 | 15 | |
| GUI Grounding | OSWorld-G refined annotation | Text Match83.5 | 14 | |
| Computer Use Agent Navigation | OSWorld (Verified) | Success Rate78.7 | 13 | |
| End-to-end task execution | OSWorld (test) | Success Rate38.54 | 12 | |
| Computer Use | OSWorld (Verified) | Score75 | 12 |