Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
About
General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control. This report introduces the latest generation of the Gemini Robotics model family: Gemini Robotics 1.5, a multi-embodiment Vision-Language-Action (VLA) model, and Gemini Robotics-ER 1.5, a state-of-the-art Embodied Reasoning (ER) model. We are bringing together three major innovations. First, Gemini Robotics 1.5 features a novel architecture and a Motion Transfer (MT) mechanism, which enables it to learn from heterogeneous, multi-embodiment robot data and makes the VLA more general. Second, Gemini Robotics 1.5 interleaves actions with a multi-level internal reasoning process in natural language. This enables the robot to "think before acting" and notably improves its ability to decompose and execute complex, multi-step tasks, and also makes the robot's behavior more interpretable to the user. Third, Gemini Robotics-ER 1.5 establishes a new state-of-the-art for embodied reasoning, i.e., for reasoning capabilities that are critical for robots, such as visual and spatial understanding, task planning, and progress estimation. Together, this family of models takes us a step towards an era of physical agents-enabling robots to perceive, think and then act so they can solve complex multi-step tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Spatial Reasoning | ROBOSPATIAL | Overall Score62.5 | 29 | |
| Episodic-memory Question Answering | OpenEQA v1 (ScanNet) | LLM-Match59.2 | 29 | |
| Robot Manipulation | BOP-ASK | Pose Error0.00e+0 | 15 | |
| Spatial Reasoning | RefSpatial | Accuracy (Spatial Reasoning)41.72 | 15 | |
| Spatial Reasoning | CVBench | 2D Relationship Score95.54 | 15 | |
| Image Pointing | Point-Bench | Average Score67.1 | 15 | |
| Spatial Reasoning | BLINK | Depth69.23 | 15 | |
| Path-level reasoning | UNOBench real (Hard) | SR-P38.9 | 10 | |
| Path-level reasoning | UNOBench synthetic Easy (test) | SR (Precision)67.1 | 10 | |
| Path-level reasoning | UNOBench real No obstructions | SR (%)37.7 | 10 |