| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Interactive Decision Making | ScienceWorld Seen | Success Rate88.8 | 72 | |
| Interactive Decision Making | ScienceWorld | Success Rate54.2 | 42 | |
| Interactive Decision Making | ScienceWorld Unseen | Success Rate85.15 | 32 | |
| Interactive Reasoning | ScienceWorld (Seen) | Success Rate63.91 | 31 | |
| Mean Reward | ScienceWorld | Mean Reward0.319 | 30 | |
| Science Simulation Task Completion | ScienceWorld Unseen | Success Rate66.3 | 28 | |
| Science Simulation Task Completion | ScienceWorld Seen | Success Rate69.8 | 28 | |
| Multi-turn Agentic Task | ScienceWorld | Success Rate62 | 28 | |
| Interactive Science Reasoning | ScienceWorld (test) | Score84.6 | 27 | |
| Science Experiment Execution | ScienceWorld (test) | Success Rate51.51 | 24 | |
| Interactive Decision-making | ScienceWorld Unseen (test) | Success Rate58.94 | 24 | |
| Interactive Environment Task Completion | ScienceWorld (Unseen) | Average Reward90.1 | 22 | |
| Interactive Environment Task Completion | ScienceWorld (Seen) | Average Reward89.5 | 22 | |
| Agentic Reasoning | ScienceWorld | Original Score82.2 | 20 | |
| Interactive Decision Making | ScienceWorld Seen (val) | Average Reward0.7349 | 20 | |
| World Modeling | ScienceWorld | Matter Score52.8 | 20 | |
| Embodied Agent Task | ScienceWorld Unseen | Success Rate70.8 | 18 | |
| Embodied Agent Task | ScienceWorld Seen | Success Rate70.9 | 18 | |
| Text-based Task Completion | ScienceWorld | Mean Normalised Score32.43 | 18 | |
| Agentic task completion | ScienceWorld | L0 Score75 | 18 | |
| Agent Interaction | ScienceWorld (test) | Success Rate38.18 | 17 | |
| Agent Interaction | ScienceWorld (val) | Success Rate44.08 | 17 | |
| scientific reasoning | ScienceWorld | Overall Score83.7 | 16 | |
| Interactive Agent Task | ScienceWorld | Efficiency Factor11.5 | 15 | |
| Interactive Decision Making | ScienceWorld (OOD) | Score9.9 | 14 |