OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning

About

General-purpose robots capable of performing diverse tasks require synergistic reasoning and acting capabilities. However, recent dual-system approaches, which separate high-level reasoning from low-level acting, often suffer from challenges such as limited mutual understanding of capabilities between systems and latency issues. This paper introduces OneTwoVLA, a single unified vision-language-action model that can perform both acting (System One) and reasoning (System Two). Crucially, OneTwoVLA adaptively switches between two modes: explicitly reasoning at critical moments during task execution, and generating actions based on the most recent reasoning at other times. To further unlock OneTwoVLA's reasoning and generalization capabilities, we design a scalable pipeline for synthesizing embodied reasoning-centric vision-language data, used for co-training with robot data. We validate OneTwoVLA's effectiveness through extensive experiments, highlighting its superior performance across four key capabilities: long-horizon task planning, error detection and recovery, natural human-robot interaction, and generalizable visual grounding, enabling the model to perform long-horizon, highly dexterous manipulation tasks such as making hotpot or mixing cocktails.

Fanqi Lin, Ruiqian Nai, Yingdong Hu, Jiacheng You, Junming Zhao, Yang Gao• 2025

Related benchmarks

Task	Dataset	Result
Error Detection and Recovery	Hotpot Robot Data (test)	Recovery Success Ratio5	3
Error Detection and Recovery	Robot Tasks Combined Total (test)	Successful Recoveries Count8	3
Visual Grounding	Single-Env (test)	Success Rate88	3
Visual Grounding	Open-World (test)	Success Rate73	3
Error Detection and Recovery	Tomato-Egg Robot Data (test)	Recovery Success Rate3	3
Human-Robot Interaction	HotPot	Successes10	2
Human-Robot Interaction	Cocktail	Successes10	2
Human-Robot Interaction	Hotpot and Cocktail Aggregate	Successes20	2

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord