Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions
About
We address the problem of tactile localization, where the goal is to identify image regions that share the same material properties as a tactile input. Existing visuo-tactile methods rely on global alignment and thus fail to capture the fine-grained local correspondences required for this task. The challenge is amplified by existing datasets, which predominantly contain close-up, low-diversity images. We propose a model that learns local visuo-tactile alignment via dense cross-modal feature interactions, producing tactile saliency maps for touch-conditioned material segmentation. To overcome dataset constraints, we introduce: (i) in-the-wild multi-material scene images that expand visual diversity, and (ii) a material-diversity pairing strategy that aligns each tactile sample with visually varied yet tactilely consistent images, improving contextual localization and robustness to weak signals. We also construct two new tactile-grounded material segmentation datasets for quantitative evaluation. Experiments on both new and existing benchmarks show that our approach substantially outperforms prior visuo-tactile methods in tactile localization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tactile Localization | TG annotated version (test) | mIoU77.58 | 10 | |
| Tactile Localization | Web-Material (test) | mIoU60.94 | 10 | |
| Tactile Localization | OpenSurfaces (test) | mIoU36.73 | 10 | |
| Interactive Localization | Web-Material-Interactive (test) | IIoU37 | 8 | |
| Material Classification | Touch-and-Go (original) | Accuracy67.77 | 7 |