| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AndroidControl High | UI-TARS-7B | Task Match (TM)83.7 | 27 | 3d ago | |
| MiniWob++ | LAMO-3B | Success Rate77.2 | 25 | 3d ago | |
| GUI Odyssey | UI-TARS-7B | Task Metric (TM)94.6 | 15 | 3d ago | |
| Average (AC-Real-TSR, MiniWob++, AndroidWorld) | UI-Copilot-7B | Success Rate (SR)38.7 | 13 | 3d ago | |
| AndroidControl AC-Real | UI-S1-7B | PG32.4 | 13 | 3d ago | |
| AITW Gen | UI-S1-7B | PG19.5 | 12 | 11d ago | |
| WorldGUI Augmented 1.0 | Success Rate (Office)83.5 | 11 | 1mo ago | ||
| WorldGUI Meta 1.0 | Success Rate (Office)88.9 | 11 | 1mo ago | ||
| WindowsAgentArena | OS-SYMPHONY | Success Rate (Office)54.76 | 11 | 1mo ago | |
| OSWorld Verified (test) | Overall Success Rate61.92 | 9 | 1mo ago | ||
| TreeCUA OOD benchmark 1.0 (test) | TreeCUA-DPO-7B | SR3,080 | 3 | 1mo ago | |
| Desktop GUI tasks Avg 1.0 (pilot) | GPA | Success Rate100 | 2 | 16d ago | |
| Desktop GUI tasks Hard 1.0 (pilot) | GPA | Success Rate100 | 2 | 16d ago | |
| Desktop GUI tasks Simple 1.0 (pilot) | GPA | Success Rate100 | 2 | 16d ago |