| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SkillLearnBench Random | Revision v3 | Success Count29 | 20 | 1d ago | |
| WebShop | Claude-sonnet-4.5 | Real Success Score61 | 14 | 3mo ago | |
| TextWorld | GPT-4-turbo | Real100 | 14 | 3mo ago | |
| SciWorld | GPT-5 | Real68.21 | 14 | 3mo ago | |
| ALFWorld | GPT-5 | Real Success91 | 14 | 3mo ago | |
| MemGUI-Bench 1.0 (test) | Pass@1 L166.7 | 11 | 5d ago | ||
| OSWorld (test) | Compressed-a11y | Chrome Success Rate25 | 4 | 1mo ago | |
| User Study (Overall) | Success Rate96 | 4 | 1mo ago | ||
| User Study (Trecharge5) | PlantFORM | Success Rate100 | 4 | 1mo ago | |
| User Study (Tdischarge1) | Success Rate89 | 4 | 1mo ago | ||
| User Study (Trecharge4) | PlantSCREEN | Success Rate96 | 4 | 1mo ago | |
| User Study (Trecharge3) | PlantSCREEN | Success Rate100 | 4 | 1mo ago | |
| User Study (Trecharge2) | Success Rate100 | 4 | 1mo ago | ||
| User Study (Trecharge1) | Success Rate100 | 4 | 1mo ago |