Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Galaxea Open-World Dataset and G0 Dual-System VLA Model

About

We present Galaxea Open-World Dataset, a large-scale, diverse collection of robot behaviors recorded in authentic human living and working environments. All demonstrations are gathered using a consistent robotic embodiment, paired with precise subtask-level language annotations to facilitate both training and evaluation. Building on this dataset, we introduce G0, a dual-system framework that couples a Vision-Language Model (VLM) for multimodal planning with a Vision-Language-Action (VLA) model for fine-grained execution. G0 is trained using a three-stage curriculum: cross-embodiment pre-training, single-embodiment pre-training, and task-specific post-training. A comprehensive benchmark spanning tabletop manipulation, few-shot learning, and long-horizon mobile manipulation, demonstrates the effectiveness of our approach. In particular, we find that the single-embodiment pre-training stage, together with the Galaxea Open-World Dataset, plays a critical role in achieving strong performance.

Tao Jiang, Tianyuan Yuan, Yicheng Liu, Chenhao Lu, Jianning Cui, Xiao Liu, Shuiqi Cheng, Jiyang Gao, Huazhe Xu, Hang Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationNut Assembly Galaxea R1 Lite
Success Rate10
2
Robot ManipulationTube Arrangement Galaxea R1 Lite
Success Rate20
2
Showing 2 of 2 rows

Other info

Follow for update