| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Interactive Decision-making | ALFWorld | Overall Success Rate99.6 | 295 | |
| Embodied Task | ALFWorld | Overall Success Rate97.5 | 169 | |
| Embodied Reasoning | ALFWorld | Accuracy0.96 | 151 | |
| Embodied Task Completion | ALFWorld | Success Rate94 | 96 | |
| Instruction Following | ALFWorld | Accuracy89.3 | 82 | |
| Interactive Decision Making | ALFWorld (test) | Success Rate96.87 | 71 | |
| Embodied Decision Making | ALFWorld held-out (test) | Score95.5 | 49 | |
| Agentic reasoning | ALFWorld | Success Rate76.02 | 45 | |
| Interactive Task Completion | ALFWorld | Pick Success Rate100 | 45 | |
| Embodied Agent Task | ALFWorld Unseen | Success Rate79.1 | 40 | |
| Agent Task | AlfWorld | Success Rate86.7 | 40 | |
| Instruction Following | ALFWorld (val seen) | Success Rate (SR)88.57 | 39 | |
| Embodied Instruction Following | AlfWorld | Average Success Rate99.3 | 33 | |
| Interactive Decision Making | ALFWorld Unseen | Success Rate97.76 | 32 | |
| Interactive Decision Making | ALFWorld Seen | Success Rate97.86 | 32 | |
| Multi-turn Agent Interaction | ALFWorld (test) | Success Rate (Pick)100 | 31 | |
| Interactive Environment Task Completion | ALFWorld (Unseen) | Average Reward91.8 | 31 | |
| Interactive Environment Task Completion | ALFWorld (Seen) | Average Reward90.2 | 31 | |
| Embodied Agent | ALFWorld | Success Rate100 | 31 | |
| Embodied decision-making | ALFWorld | Success Rate82.84 | 31 | |
| Mean Reward | ALFWorld | Mean Reward0.767 | 30 | |
| Text-based embodied AI | ALFWorld | Pick Success100 | 30 | |
| Multi-turn planning | ALFWorld (test) | Reward97.9 | 30 | |
| Embodied Task Execution | ALFWorld | Success Rate93.28 | 29 | |
| Interactive Instruction Following | ALFWorld Unseen | Success Rate86.68 | 28 |