Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAM3-I: Segment Anything with Instructions

About

Segment Anything Model 3 (SAM3) advances open-vocabulary segmentation through promptable concept segmentation, enabling users to segment all instances associated with a given concept using short noun-phrase (NP) prompts. While effective for concept-level grounding, real-world interactions often involve far richer natural-language instructions that combine attributes, relations, actions, states, or implicit reasoning. Currently, SAM3 relies on external multi-modal agents to convert complex instructions into NPs and conducts iterative mask filtering, leading to coarse representations and limited instance specificity. In this work, we present SAM3-I, an instruction-following extension of the SAM family that unifies concept-level grounding and instruction-level reasoning within a single segmentation framework. Built upon SAM3, SAM3-I introduces an instruction-aware cascaded adaptation mechanism with dedicated alignment losses that progressively aligns expressive instruction semantics with SAM3's vision-language representations, enabling direct interpretation of natural-language instructions while preserving its strong concept recall ability. To enable instruction-following learning, we introduce HMPL-Instruct, a large-scale instruction-centric dataset that systematically covers hierarchical instruction semantics and diverse target granularities. Experiments demonstrate that SAM3-I achieves appealing performance across referring and reasoning-based segmentation, showing that SAM3 can be effectively extended to follow complex natural-language instructions without sacrificing its original concept-driven strengths. Code and dataset are available at https://github.com/debby-0527/SAM3-I.

Jingjing Li, Yue Feng, Yuchen Guo, Jincai Huang, Wei Ji, Qi Bi, Yongri Piao, Miao Zhang, Xiaoqi Zhao, Qiang Chen, Shihao Zou, Huchuan Lu, Li Cheng• 2025

Related benchmarks

TaskDatasetResultRank
Reasoning SegmentationIntent2Part InstructPart
mIoU68.9
9
Referring SegmentationIntent2Part InstructPart
mIoU70.1
9
Intent-level SegmentationIntent2Part clean (test)
mIoU34.2
9
Intent-level SegmentationIntent2Part (test)
mIoU31.8
9
Complex Instruction-Following SegmentationPACO-LVIS-Instruct Complex (test)
gIoU51
3
Simple Instruction-Following SegmentationPACO-LVIS-Instruct Simple Instruct. (test)
gIoU54
3
Concept-level GroundingPACO-LVIS-Instruct Concept-level (test)
gIoU48.9
2
Showing 7 of 7 rows

Other info

Follow for update