Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

About

Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.

Zhemeng Zhang, Jiahua Ma, Xincheng Yang, Xin Wen, Yuzhi Zhang, Boyan Li, Yiran Qin, Jin Liu, Can Zhao, Li Kang, Haoqin Hong, Zhenfei Yin, Philip Torr, Hao Su, Ruimao Zhang, Daolin Ma• 2026

Related benchmarks

TaskDatasetResultRank
Chip HandoverChip Handover 50 Demos Bi-Arx5 Dual-arm 1.0 (test)
Success Rate60
13
Cucumber PeelingCucumber Peeling 50 Demos, Bi-Arx5 Dual-arm 1.0 (test)
Task Score97.5
13
Lock OpeningLock Opening 20 Demos Flexiv Rizon4 Single-arm 1.0 (test)
Success Rate30
13
Multi-task Performance AggregationCombined Five Tasks (Shoe Lacing, Chip Handover, Cucum. Peeling, Vase Wiping, Lock Opening) 1.0 (average)
Average Performance58
13
Shoe LacingShoe Lacing 100 Demos, Bi-Arx5 Dual-arm 1.0 (test)
Success Rate35
13
Vase WipingVase Wiping 30 Demos Flexiv Rizon4 Single-arm 1.0 (test)
Task Score67.5
13
Showing 6 of 6 rows

Other info

Follow for update