Green-VLA: Staged Vision-Language-Action Model for Generalist Robots

About

We introduce Green-VLA, a staged Vision-Language-Action (VLA) framework for real-world deployment on the Green humanoid robot while maintaining generalization across diverse embodiments. Green-VLA follows a five stage curriculum: (L0) foundational VLMs, (L1) multimodal grounding, (R0) multi-embodiment pretraining, (R1) embodiment-specific adaptation, and (R2) reinforcement-learning (RL) policy alignment. We couple a scalable data-processing pipeline (3,000 hours of demonstrations) with temporal alignment and quality filtering, and use a unified, embodiment-aware action interface enabling a single policy to control humanoids, mobile manipulators, and fixed-base arms. At inference, the VLA controller is enhanced with episode-progress prediction, out-of-distribution detection, and joint-prediction-based guidance to improve safety and precise target selection. Experiments on Simpler BRIDGE WidowX and CALVIN ABC-D, as well as real-robot evaluations, show strong generalization and performance gains from RL alignment in success rate, robustness, and long-horizon efficiency.

I. Apanasevich, M. Artemyev, R. Babakyan, P. Fedotova, D. Grankin, E. Kupryashin, A. Misailidi, D. Nerus, A. Nutalapati, G. Sidorov, I. Efremov, M. Gerasyov, D. Pikurov, Y. Senchenko, S. Davidenko, D. Kulikov, M. Sultankin, K. Askarbek, O. Shamanin, D. Statovoy, E. Zalyaev, I. Zorin, A. Letkin, E. Rusakov, A. Silchenko, V. Vorobyov, S. Sobolnikov, A. Postnikov• 2026

Related benchmarks

Task	Dataset	Result
Robot Manipulation	SimplerEnv Google Robot tasks Variant Aggregation	Average Success Rate73.7	88
Robot Manipulation	SimplerEnv Google Robot tasks Visual Matching	Pick Coke Can Success Rate98.1	62
Robot Manipulation	SimplerEnv WidowX Robot tasks	Average Success Rate79.1	32
Robot Manipulation	SimplerEnv Google Robot tasks - Overall	Average Success71.8	7
Robot Manipulation	SimplerEnv GoogleRobot (out-of-domain)	Success Rate (Pick Coke Can)90.4	6
Bimanual Table-cleaning	ALOHA table-cleaning	Tape SR83.1	5

Showing 6 of 6 rows

Other info

GitHub

Follow for update

@wizwand_team Discord