| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Embodied Reasoning | ALFWorld | Accuracy0.96 | 151 | |
| Interactive Decision-making | ALFWorld | Overall Success Rate97.71 | 118 | |
| Embodied Task | ALFWorld | Overall Success Rate96.9 | 96 | |
| Instruction Following | ALFWorld | Accuracy89.3 | 82 | |
| Interactive Decision Making | ALFWorld (test) | Success Rate96.87 | 67 | |
| Embodied decision-making | ALFWorld | Success Rate82.84 | 31 | |
| Interactive Task Completion | ALFWorld | Pick Success Rate100 | 28 | |
| Sequential Decision Making | ALFWorld (test) | Success Rate96.27 | 26 | |
| Interactive Decision Making | ALFWorld Unseen | Success Rate97.76 | 23 | |
| Interactive Decision Making | ALFWorld Seen | Success Rate97.86 | 23 | |
| Embodied Interaction | ALFWorld | Success Rate94.78 | 22 | |
| Decision Making | AlfWorld | Steps6.4 | 22 | |
| Agentic Reasoning | ALFWorld (test) | Success Rate97.7 | 21 | |
| Agent Task | AlfWorld | Success Rate83.6 | 21 | |
| Household Agent Interaction | ALFWorld | Pick Success Rate99.1 | 20 | |
| Interactive environment task success | ALFWorld (test) | Overall Success Rate91.79 | 20 | |
| Interactive Embodied Agent Task | ALFWorld (val) | Pick Success Rate100 | 19 | |
| Trajectory Unlearning | AlfWorld | Unlearn Efficacy100 | 18 | |
| Multi-turn decision making | ALFWorld (val-unseen) | Success Rate94.1 | 18 | |
| Embodied AI Reasoning | ALFWorld | CoT Match Rate100 | 18 | |
| Interactive Reasoning | ALFWorld | Average Reasoning Length (tokens)47.9 | 18 | |
| Agentic task completion | ALFWorld | Look Success100 | 18 | |
| Agent Behavior Adaptation | AlfWorld (AW) (test) | Loop Ratio1,040 | 17 | |
| Embodied Task Planning | ALFWorld (test) | Success Rate (Avg)98.7 | 17 | |
| Next-state prediction | ALFWorld (AW) | EM Accuracy99.87 | 16 |