Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

About

We introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex instructions, plan long-horizon tasks, and interact naturally with human within an end-to-end framework. Robix further introduces novel capabilities such as proactive dialogue, real-time interruption handling, and context-aware commonsense reasoning during task execution. At its core, Robix leverages chain-of-thought reasoning and adopts a three-stage training strategy: (1) continued pretraining to enhance foundational embodied reasoning abilities including 3D spatial understanding, visual grounding, and task-centric reasoning; (2) supervised finetuning to model human-robot interaction and task planning as a unified reasoning-action sequence; and (3) reinforcement learning to improve reasoning-action consistency and long-horizon task coherence. Extensive experiments demonstrate that Robix outperforms both open-source and commercial baselines (e.g., GPT-4o and Gemini 2.5 Pro) in interactive task execution, demonstrating strong generalization across diverse instruction types (e.g., open-ended, multi-stage, constrained, invalid, and interrupted) and various user-involved tasks such as table bussing, grocery shopping, and dietary filtering.

Huang Fang, Mengxi Zhang, Heng Dong, Wei Li, Zixuan Wang, Qifeng Zhang, Xueyun Tian, Yucheng Hu, Hang Li• 2025

Related benchmarks

TaskDatasetResultRank
Spatial Mental ModelingSAT (real)
AVG79.6
41
Spatial ReasoningEmbSpatial
Overall Accuracy77.4
30
Spatial ReasoningSAT ood (test)
Accuracy79.6
11
Spatial ReasoningVSR (ood)
Accuracy83.7
10
Multi-modal ReasoningCV-Bench
Overall Accuracy86.5
6
Spatial AwarenessSAT
Accuracy (All)71.1
6
Object PlacementWhere2Place
Overall Score41.9
6
Showing 7 of 7 rows

Other info

Follow for update