| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ALFWorld | ReAct | Success Rate94 | 96 | 12d ago | |
| EB-Habitat | Claude-4.6 | Avg Success Rate77 | 63 | 7d ago | |
| ALFRED EB | Claude-4.6 | Avg Score92 | 36 | 7d ago | |
| ALFRED unseen (test) | Prompter | Success Rate4,572 | 26 | 7d ago | |
| ALFRED seen (test) | Prompter | Success Rate (SR)53.23 | 26 | 7d ago | |
| ALFWorld | RLSD | Pick Success Rate100 | 21 | 6d ago | |
| ALFWorld ID | SKILL0 | Pick Success Rate94.3 | 18 | 6d ago | |
| ALFWorld (test) | MASA | Pick Success Rate85.7 | 16 | 2d ago | |
| ALFWorld (IOD) | SKILLGEN | Accuracy96.67 | 16 | 21d ago | |
| ALFWorld (All tasks) | Evolving-RL | Overall Success Rate96 | 16 | 6d ago | |
| ALFWorld v1.0 (test) | ReAct | Pick Success Rate100 | 15 | 21d ago | |
| AI2-THOR (test) | ReCAPA | SR75 | 11 | 1mo ago | |
| ALFWorld (Unseen tasks) | Evolving-RL | Look Success Rate100 | 10 | 21d ago | |
| ALFWorld (Seen tasks) | Evolving-RL | Pick Success Rate97.5 | 10 | 21d ago | |
| ALFWorld in-distribution held-out (test) | SkillFlow | Success Rate96.09 | 9 | 19d ago | |
| ALFWorld (OOD) | SKILLGEN | Accuracy97.25 | 8 | 21d ago | |
| ALFWorld (Mixed) | DPEPO | In-Domain Completion Rate98.6 | 7 | 1mo ago | |
| ALFRED Unseen (val) | Seq2Seq + PM R2VLM | Task Success Rate (TSR)20 | 3 | 2mo ago | |
| ALFRED seen (val) | Seq2Seq + PM R2VLM | Task Success Rate (SR)3.4 | 3 | 2mo ago | |
| ALFWorld out-of-distribution (test) | Meta-RL | Task Success Score (Cool)0.81 | 3 | 3mo ago |