Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation

About

Mobile manipulation is a fundamental capability for general-purpose robotic agents, requiring both coordinated control of the mobile base and manipulator and robust perception under dynamically changing viewpoints. However, existing approaches face two key challenges: strong coupling between base and arm actions complicates control optimization, and perceptual attention is often poorly allocated as viewpoints shift during mobile manipulation. We propose InCoM, an intent-driven perception and structured coordination framework for mobile manipulation. InCoM infers latent motion intent to dynamically reweight multi-scale perceptual features, enabling stage-adaptive allocation of perceptual attention. To support robust cross-modal perception, InCoM further incorporates a geometric-semantic structured alignment mechanism that enhances multimodal correspondence. On the control side, we design a decoupled coordinated flow matching action decoder that explicitly models coordinated base-arm action generation, alleviating optimization difficulties caused by control coupling. Experimental results demonstrate that InCoM significantly outperforms state-of-the-art methods, achieving success rate gains of 28.2%, 26.1%, and 23.6% across three ManiSkill-HAB scenarios without privileged information. Furthermore, its effectiveness is consistently validated in real-world mobile manipulation tasks, where InCoM maintains a superior success rate over existing baselines.

Jiahao Liu, Cui Wenbo, Zhongpu Xia, Haoran Li, Dongbin Zhao• 2026

Related benchmarks

TaskDatasetResultRank
Mobile ManipulationSetTable
Open Fridge Success Rate87.3
12
Inference EfficiencyInference Efficiency Benchmark--
8
Mobile ManipulationManiSkill-HAB TidyHouse
Pick All Success Rate16.7
5
Mobile ManipulationManiSkill-HAB PrepareGroceries
Pick All Success Rate15
5
Inference Efficiency ComparisonEfficiency Evaluation Setup
Inference Time (ms)140
3
Robotic ManipulationManiSkill-HAB SetTable scenario
Pick Apple Success Rate59.4
2
Showing 6 of 6 rows

Other info

Follow for update