Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

About

We introduce Green-VLA, a staged Vision-Language-Action (VLA) framework for real-world deployment on the Green humanoid robot while maintaining generalization across diverse embodiments. Green-VLA follows a five stage curriculum: (L0) foundational VLMs, (L1) multimodal grounding, (R0) multi-embodiment pretraining, (R1) embodiment-specific adaptation, and (R2) reinforcement-learning (RL) policy alignment. We couple a scalable data-processing pipeline (3,000 hours of demonstrations) with temporal alignment and quality filtering, and use a unified, embodiment-aware action interface enabling a single policy to control humanoids, mobile manipulators, and fixed-base arms. At inference, the VLA controller is enhanced with episode-progress prediction, out-of-distribution detection, and joint-prediction-based guidance to improve safety and precise target selection. Experiments on Simpler BRIDGE WidowX and CALVIN ABC-D, as well as real-robot evaluations, show strong generalization and performance gains from RL alignment in success rate, robustness, and long-horizon efficiency.

I. Apanasevich, M. Artemyev, R. Babakyan, P. Fedotova, D. Grankin, E. Kupryashin, A. Misailidi, D. Nerus, A. Nutalapati, G. Sidorov, I. Efremov, M. Gerasyov, D. Pikurov, Y. Senchenko, S. Davidenko, D. Kulikov, M. Sultankin, K. Askarbek, O. Shamanin, D. Statovoy, E. Zalyaev, I. Zorin, A. Letkin, E. Rusakov, A. Silchenko, V. Vorobyov, S. Sobolnikov, A. Postnikov• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationSimplerEnv Google Robot tasks Visual Matching
Pick Coke Can Success Rate98.1
62
Robot ManipulationSimplerEnv Google Robot tasks Variant Aggregation
Pick Coke Can Success Rate98.2
44
Robot ManipulationSimplerEnv WidowX Robot tasks
Average Success Rate79.1
26
Robot ManipulationSimplerEnv Google Robot tasks - Overall
Average Success71.8
7
Bimanual Table-cleaningALOHA table-cleaning
Tape SR83.1
5
Showing 5 of 5 rows

Other info

GitHub

Follow for update