Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning

About

General-purpose robots capable of performing diverse tasks require synergistic reasoning and acting capabilities. However, recent dual-system approaches, which separate high-level reasoning from low-level acting, often suffer from challenges such as limited mutual understanding of capabilities between systems and latency issues. This paper introduces OneTwoVLA, a single unified vision-language-action model that can perform both acting (System One) and reasoning (System Two). Crucially, OneTwoVLA adaptively switches between two modes: explicitly reasoning at critical moments during task execution, and generating actions based on the most recent reasoning at other times. To further unlock OneTwoVLA's reasoning and generalization capabilities, we design a scalable pipeline for synthesizing embodied reasoning-centric vision-language data, used for co-training with robot data. We validate OneTwoVLA's effectiveness through extensive experiments, highlighting its superior performance across four key capabilities: long-horizon task planning, error detection and recovery, natural human-robot interaction, and generalizable visual grounding, enabling the model to perform long-horizon, highly dexterous manipulation tasks such as making hotpot or mixing cocktails.

Fanqi Lin, Ruiqian Nai, Yingdong Hu, Jiacheng You, Junming Zhao, Yang Gao• 2025

Related benchmarks

TaskDatasetResultRank
Error Detection and RecoveryHotpot Robot Data (test)
Recovery Success Ratio5
3
Error Detection and RecoveryRobot Tasks Combined Total (test)
Successful Recoveries Count8
3
Visual GroundingSingle-Env (test)
Success Rate88
3
Visual GroundingOpen-World (test)
Success Rate73
3
Error Detection and RecoveryTomato-Egg Robot Data (test)
Recovery Success Rate3
3
Human-Robot InteractionHotPot
Successes10
2
Human-Robot InteractionCocktail
Successes10
2
Human-Robot InteractionHotpot and Cocktail Aggregate
Successes20
2
Showing 8 of 8 rows

Other info

Follow for update