Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

About

Embodied intelligence requires high-fidelity simulation environments to support perception and decision-making, yet existing platforms often suffer from data contamination and limited flexibility. To mitigate this, we propose World2Minecraft to convert real-world scenes into structured Minecraft environments based on 3D semantic occupancy prediction. In the reconstructed scenes, we can effortlessly perform downstream tasks such as Vision-Language Navigation(VLN). However, we observe that reconstruction quality heavily depends on accurate occupancy prediction, which remains limited by data scarcity and poor generalization in existing models. We introduce a low-cost, automated, and scalable data acquisition pipeline for creating customized occupancy datasets, and demonstrate its effectiveness through MinecraftOcc, a large-scale dataset featuring 100,165 images from 156 richly detailed indoor scenes. Extensive experiments show that our dataset provides a critical complement to existing datasets and poses a significant challenge to current SOTA methods. These findings contribute to improving occupancy prediction and highlight the value of World2Minecraft in providing a customizable and editable platform for personalized embodied AI research. Project page:https://world2minecraft.github.io/.

Lechao Zhang, Haoran Xu, Jingyu Gong, Xuhong Wang, Yuan Xie, Xin Tan• 2026

Related benchmarks

TaskDatasetResultRank
3D Occupancy PredictionMinecraftOcc--
12
Next-ActionMinecraftVLN Base--
6
Next-ActionMinecraftVLN (Extend)--
6
Next-ActionMinecraftVLN (Combined)--
6
Next-ViewMinecraftVLN Base--
6
Next-ViewMinecraftVLN (Extend)--
6
Next-ViewMinecraftVLN (Combined)--
6
Layout-based scene generationMinecraftVLN (test)
OOB Rate2.4
4
Showing 8 of 8 rows

Other info

GitHub

Follow for update