| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Interactive Reasoning | ScienceWorld (Seen) | Success Rate63.91 | 31 | |
| Interactive Science Reasoning | ScienceWorld (test) | Score84.6 | 27 | |
| Interactive Decision-making | ScienceWorld Unseen (test) | Success Rate58.94 | 24 | |
| World Modeling | ScienceWorld | Matter Score52.8 | 20 | |
| Agentic task completion | ScienceWorld | L0 Score75 | 18 | |
| scientific reasoning | ScienceWorld Unseen | Average Reward58.5 | 10 | |
| scientific reasoning | ScienceWorld Seen | Average Reward71.6 | 10 | |
| Textual Environment Interaction | ScienceWorld | Base Score96.06 | 8 | |
| Interactive Science Simulation | ScienceWorld v1.0 (test) | Task 1-1 (L) Score97.04 | 8 | |
| Interactive Reasoning | ScienceWorld (Unseen) | Success Rate0.5862 | 7 | |
| Scientific Reasoning in Text-based Environments | ScienceWorld (test) | Task 1-1 Score44.8 | 7 | |
| Science simulation and text-based scientific reasoning | ScienceWorld variations (test) | Changes of State: Boiling Success4 | 7 | |
| Sequential Decision Making | ScienceWorld | Pass@1 Success Rate70.4 | 4 | |
| scientific reasoning | ScienceWorld | Seen Accuracy74.5 | 3 |