| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ALFWorld | SAMPO | Overall Success Rate97.71 | 118 | 9d ago | |
| ALFWorld (test) | SELAUR | Success Rate96.87 | 67 | 1mo ago | |
| WebShop | SAMPO | Success Rate84.02 | 36 | 13d ago | |
| WebShop (test) | GiGPO + PA-MoE | Score93.1 | 28 | 1mo ago | |
| WebShop | GPT-5 | Real39 | 24 | 3d ago | |
| TextWorld | GPT-5 | Real100 | 24 | 3d ago | |
| ScienceWorld Unseen (test) | ITPR | Success Rate58.94 | 24 | 1mo ago | |
| ALFWorld Unseen | STEP-HRL | Success Rate97.76 | 23 | 10d ago | |
| ALFWorld Seen | STEP-HRL | Success Rate97.86 | 23 | 10d ago | |
| ScienceWorld Unseen | STEP-HRL | Success Rate77.81 | 23 | 10d ago | |
| ScienceWorld Seen | STEP-HRL | Success Rate81.57 | 23 | 10d ago | |
| Virtualhome | HISR | Success Rate59.1 | 15 | 29d ago | |
| ALFWorld unseen (test) | ProxMO | Pick Success98.4 | 14 | 1mo ago | |
| ALFWorld (qualitative context) | Success Rate (SR)99 | 8 | 1mo ago |