Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Modal Manipulation via Multi-Modal Policy Consensus

About

Effectively integrating diverse sensory modalities is crucial for robotic manipulation. However, the typical approach of feature concatenation is often suboptimal: dominant modalities such as vision can overwhelm sparse but critical signals like touch in contact-rich tasks, and monolithic architectures cannot flexibly incorporate new or missing modalities without retraining. Our method factorizes the policy into a set of diffusion models, each specialized for a single representation (e.g., vision or touch), and employs a router network that learns consensus weights to adaptively combine their contributions, enabling incremental of new representations. We evaluate our approach on simulated manipulation tasks in {RLBench}, as well as real-world tasks such as occluded object picking, in-hand spoon reorientation, and puzzle insertion, where it significantly outperforms feature-concatenation baselines on scenarios requiring multimodal reasoning. Our policy further demonstrates robustness to physical perturbations and sensor corruption. We further conduct perturbation-based importance analysis, which reveals adaptive shifts between modalities.

Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yunzhu Li, Yilun Du, Katherine Driggs-Campbell• 2025

Related benchmarks

TaskDatasetResultRank
Vase WipingVase Wiping 30 Demos Flexiv Rizon4 Single-arm 1.0 (test)
Task Score40.5
13
Chip HandoverChip Handover 50 Demos Bi-Arx5 Dual-arm 1.0 (test)
Success Rate15
13
Multi-task Performance AggregationCombined Five Tasks (Shoe Lacing, Chip Handover, Cucum. Peeling, Vase Wiping, Lock Opening) 1.0 (average)
Average Performance24.7
13
Shoe LacingShoe Lacing 100 Demos, Bi-Arx5 Dual-arm 1.0 (test)
Success Rate0.00e+0
13
Cucumber PeelingCucumber Peeling 50 Demos, Bi-Arx5 Dual-arm 1.0 (test)
Task Score63
13
Lock OpeningLock Opening 20 Demos Flexiv Rizon4 Single-arm 1.0 (test)
Success Rate5
13
Robotic ManipulationWeight-Based Bottle Placement
Success Rate15
7
Robotic ManipulationManipulation Task Suite Bottle, Connector, Lid
Average Success Rate54
7
Robotic ManipulationTwisty Connector Pull Out
Success Rate1
7
Robotic ManipulationEgg Boiler Lid Opening
Success Rate0.55
7
Showing 10 of 10 rows

Other info

Follow for update