| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-horizon Task Execution | Minecraft Long-horizon Tasks | Wood100 | 15 | |
| Short-horizon dependency-based functional block utilization | Minecraft | CraftGrid Success@0→1057.5 | 11 | |
| Multi-step dependency reasoning | Minecraft | WeaponSet Success@0→1040 | 11 | |
| Functionally equivalent reasoning | Minecraft | BridgeEq Success@0→1050 | 11 | |
| Structural and shape-based recipe transfer | Minecraft | Bed Success Rate (0->10 steps)62.5 | 11 | |
| Embodied Agent Task Completion | Minecraft Armor Group | Success Rate (SR)55.6 | 8 | |
| Embodied Agent Task Completion | Minecraft Redstone Group | Success Rate (SR)49.4 | 8 | |
| Embodied Agent Task Completion | Minecraft Diamond Group | Success Rate (SR)66.1 | 8 | |
| Embodied Agent Task Completion | Minecraft Gold Group | Success Rate (SR)72.3 | 8 | |
| Embodied Agent Task Completion | Minecraft Iron Group | Success Rate (SR)74 | 8 | |
| Embodied Agent Task Completion | Minecraft Stone Group | Success Rate (SR)80 | 8 | |
| Embodied Agent Task Completion | Minecraft Wood Group | Success Rate (SR)95.7 | 8 | |
| Sequential Milestone Success Rate | Minecraft Obtain Diamond task | Log Success Rate100 | 8 | |
| Video Generation | Minecraft | FVD62.43 | 8 | |
| Video Prediction | Minecraft (300 frames) | SSIM0.506 | 6 | |
| Video Prediction | Minecraft | SSIM34.9 | 6 | |
| Long-Context Video Prediction | Minecraft 128x128 (test) | SSIM0.448 | 6 | |
| Open-Ended Instruction Task Execution | Minecraft Open-Ended Instruction Tasks (test) | Torch Success Rate75 | 6 | |
| Boss Combat | Minecraft Ender Dragon (the End) | Health Ratio67.9 | 4 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~200 frames) | PSNR14.02 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~128 frames) | PSNR14.9 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~64 frames) | PSNR16.31 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~32 frames) | PSNR17.87 | 3 | |
| Inverse Dynamics Modeling | Minecraft | Pearson R (X)80.29 | 2 | |
| Image Classification (Animal Presence Detection) | Minecraft (test) | Top-1 Accuracy- | 0 |