Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RepSAM: Bridging Foundation Models to Robotic Vision via Representation-Guided Adaptation

About

Robotic perception in unstructured environments remains challenging despite the zero-shot capabilities of foundation models such as SAM. This work attributes performance degradation to non-uniform representation shifts across transformer layers: shallow layers exhibit substantial domain gaps (CKA < 0.5), whereas deep layers transfer effectively (CKA > 0.7). Based on this observation, we propose RepSAM, a representation-guided parameter-efficient fine-tuning (PEFT) framework for adapting foundation models to robotic vision. RepSAM employs a theoretically grounded CKA-guided rank allocation strategy combined with a multi-modal fusion module for robust handling of challenging robotic scenarios, including transparent objects and cluttered scenes. Experimental evaluation across six benchmarks and robotic manipulation tasks demonstrates that RepSAM achieves 97.9% of full fine-tuning performance (89.0% vs. 90.9% mIoU) while reducing trainable parameters by 158x (from 632M to 4.0M). RepSAM outperforms DoRA by 7.9% mIoU with just 4 hours of training on a single A100 GPU (a 96x reduction from full fine-tuning, which takes 384 GPU-hours). These improvements are statistically significant (p < 0.01) and translate to a 12.0% absolute improvement in robotic manipulation success rates over the LoRA (RGB) baseline.

Wenhui Chu• 2026

Related benchmarks

TaskDatasetResultRank
Robotic Image SegmentationOCID
mIoU91.8
42
Semantic segmentationYCB-V
mIoU88
23
Semantic segmentationClearGrasp
mIoU90.1
15
Semantic segmentationGraspNet
mIoU89
15
Semantic segmentationWISDOM
mIoU90.2
15
Semantic segmentationLineMOD
mIoU84.9
15
Robotic Scene SegmentationRobotic Benchmarks Average of OCID, ClearGrasp, GraspNet, WISDOM, YCB-Video, and LINEMOD
Change in mIoU (%)16.8
3
Object GraspingPyBullet Standard Scenario (test)
mIoU90.5
2
Object GraspingPyBullet Transparent Scenario (test)
mIoU85.3
2
Showing 9 of 9 rows

Other info

Follow for update