| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-horizon Task Execution | Minecraft Long-horizon Tasks | Wood100 | 15 | |
| Video Generation | Minecraft | FVD62.43 | 8 | |
| Long-Context Video Prediction | Minecraft 128x128 (test) | SSIM0.448 | 6 | |
| Open-Ended Instruction Task Execution | Minecraft Open-Ended Instruction Tasks (test) | Torch Success Rate75 | 6 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~200 frames) | PSNR14.02 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~128 frames) | PSNR14.9 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~64 frames) | PSNR16.31 | 3 | |
| Interactive World Modeling | Minecraft Interactive Gameplay (0~32 frames) | PSNR17.87 | 3 | |
| Image Classification (Animal Presence Detection) | Minecraft (test) | Top-1 Accuracy- | 0 |