| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Scientific Reasoning | SciWorld | Accuracy95.9 | 164 | |
| Science Simulation | SciWorld | Accuracy94.6 | 41 | |
| Embodied Agentic | SciWorld | Accuracy88.9 | 21 | |
| One-step next-observation prediction | SciWorld (test) | Token F196 | 16 | |
| Next-state prediction | SciWorld (SW) | EM Accuracy98.64 | 16 | |
| Task success | SciWorld | Real68.21 | 14 | |
| Scientific Reasoning | SciWorld | Success Rate (SR)59.48 | 14 | |
| Text-based embodied task | Sciworld | Success Rate77.77 | 13 | |
| Science simulation | Sciworld | Progress Rate82.6 | 12 | |
| Scientific Simulation | SciWorld | Measure Score55.7 | 10 | |
| Lifelong Agent Interaction | SciWorld | Success Rate (SR)73.5 | 10 | |
| Interactive Agent | SciWorld | Pass@129.45 | 10 | |
| Autonomous Exploration | SciWorld | Steps7.4 | 7 | |
| Science Experiment Simulation | SciWorld | Average Reward69.1 | 3 |