Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

About

Generalization in embodied AI is hindered by the "seeing-to-doing gap," which stems from data scarcity and embodiment heterogeneity. To address this, we pioneer "pointing" as a unified, embodiment-agnostic intermediate representation, defining four core embodied pointing abilities that bridge high-level vision-language comprehension with low-level action primitives. We introduce Embodied-R1, a 3B Vision-Language Model (VLM) specifically designed for embodied reasoning and pointing. We use a wide range of embodied and general visual reasoning datasets as sources to construct a large-scale dataset, Embodied-Points-200K, which supports key embodied pointing capabilities. We then train Embodied-R1 using a two-stage Reinforced Fine-tuning (RFT) curriculum with a specialized multi-task reward design. Embodied-R1 achieves state-of-the-art performance on 11 embodied spatial and pointing benchmarks. Critically, it demonstrates robust zero-shot generalization by achieving a 56.2% success rate in the SIMPLEREnv and 87.5% across 8 real-world XArm tasks without any task-specific fine-tuning, representing a 62% improvement over strong baselines. Furthermore, the model exhibits high robustness against diverse visual disturbances. Our work shows that a pointing-centric representation, combined with an RFT training paradigm, offers an effective and generalizable pathway to closing the perception-action gap in robotics.

Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Hongyao Tang, Jianye Hao• 2025

Related benchmarks

TaskDatasetResultRank
Spatial ReasoningBLINK
Spa. Score80.4
26
Embodied Spatial Point ReasoningWhere2Place
Accuracy69.5
19
Grounded Task PlanningGroundedPlanBench Implicit Instructions Short Horizon
TSR31
15
Grounded Task PlanningGroundedPlanBench Explicit Instructions, Short Horizon
TSR41.8
15
Grounded Task PlanningGroundedPlanBench Explicit Instructions, Long Horizon
TSR8.4
15
Grounded Task PlanningGroundedPlanBench Implicit Instructions Medium Horizon
TSR10.4
15
Grounded Task PlanningGroundedPlanBench Implicit Instructions, Long Horizon
TSR3.6
15
Grounded Task PlanningGroundedPlanBench Explicit Instructions, Medium Horizon
TSR13.6
15
Spatial ReasoningCVBench--
15
Visual Trace GenerationVABench-V
RMSE72.1
13
Showing 10 of 56 rows

Other info

Follow for update