Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

About

Effectively handling the interplay between spatial perception and action generation remains a critical bottleneck in robotic manipulation. Existing methods typically treat spatial perception and action execution as decoupled or strictly unidirectional processes, fundamentally restricting a robot's ability to master complex manipulation tasks. To address this, we propose X-Imitator, a versatile dual-path framework that models spatial perception and action execution as a tightly coupled bidirectional loop. By reciprocally conditioning current pose predictions on past actions and vice versa, this framework enables continuous mutual refinement between spatial reasoning and action generation. This joint modeling exactly mimics human internal forward models. Designed as a modular architecture, the system can be seamlessly integrated into various visuomotor policies. Extensive experiments across 24 simulated and 3 real-world tasks demonstrate that our framework significantly outperforms both vanilla policies and prior methods utilizing explicit pose guidance. The code will be open sourced.

Kai Xiong, Hongjie Fang, Lixin Yang, Cewu Lu• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationMetaWorld, Adroit, and Dexart Combined
Average Success Rate63.8
25
Robotic Arm ManipulationMetaWorld Very Hard
Success Rate66.6
21
Robot ManipulationDexArt
Success Rate59.4
20
Robotic Manipulation SimulationAdroit
Success Rate71.4
6
Robotic Manipulation SimulationMetaWorld hard
Success Rate57.7
6
Simulated Robotic ManipulationRoboTwin 2.0
Hammer Success Rate92
6
Robot ManipulationHang Mug Real-world
Grasp Success Rate100
2
Robot ManipulationPour Balls (Real-world)
Grasp Success Rate100
2
Robot ManipulationArrange Toy Truck (Real-world)
Grasp Success Rate100
2
Showing 9 of 9 rows

Other info

Follow for update