Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

About

Leveraging multimodal large models for image segmentation has become a prominent research direction. However, existing approaches typically rely heavily on manually annotated datasets that include explicit reasoning processes, which are costly and time-consuming to produce. Recent advances suggest that reinforcement learning (RL) can endow large models with reasoning capabilities without requiring such reasoning-annotated data. In this paper, we propose SAM-R1, a novel framework that enables multimodal large models to perform fine-grained reasoning in image understanding tasks. Our approach is the first to incorporate fine-grained segmentation settings during the training of multimodal reasoning models. By integrating task-specific, fine-grained rewards with a tailored optimization objective, we further enhance the model's reasoning and segmentation alignment. We also leverage the Segment Anything Model (SAM) as a strong and flexible reward provider to guide the learning process. With only 3k training samples, SAM-R1 achieves strong performance across multiple benchmarks, demonstrating the effectiveness of reinforcement learning in equipping multimodal models with segmentation-oriented reasoning capabilities.

Jiaqi Huang, Zunnan Xu, Jun Zhou, Ting Liu, Yicheng Xiao, Mingwen Ou, Bowen Ji, Xiu Li, Kehong Yuan• 2025

Related benchmarks

TaskDatasetResultRank
Referring Expression SegmentationRefCOCO (testA)--
217
Referring Expression SegmentationRefCOCO+ (testA)--
190
Reasoning SegmentationReasonSeg (val)
cIoU55.8
145
Reasoning SegmentationReasonSeg (test)
gIoU60.2
102
Referring SegmentationRefCOCO (val)
cIoU79.2
51
Referring Expression SegmentationRefCOCOg UMD (test)
mIoU73.1
13
Socio-name SegmentationSocioSeg (test)
cIoU25.6
10
Socio-semantic SegmentationSocioSeg (test)
cIoU22.5
10
Socio-semantic SegmentationSocioSeg OOD (New Region)
cIoU0.148
10
Socio-class SegmentationSocioSeg (test)
cIoU22.3
10
Showing 10 of 13 rows

Other info

Follow for update