Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3

About

Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its original implementation, its group-level collective memory selection is suboptimal for complex multi-object scenarios, as it employs a synchronized decision across all concurrent targets conditioned on their average performance, often overlooking individual reliability. To this end, we propose SAM3-DMS, a training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Experiments demonstrate that our approach achieves robust identity preservation and tracking stability. Notably, our advantage becomes more pronounced with increased target density, establishing a solid foundation for simultaneous multi-target video segmentation in the wild.

Ruiqi Shen, Chang Liu, Henghui Ding• 2026

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationSA-V (val)
J&F Score83.3
74
Promptable Video SegmentationSA-V (test)
J&F Score84.3
4
Promptable Video SegmentationMOSE v2 (val)
J&F Score60.3
4
Promptable Concept SegmentationSA-V (val)
cgF129.4
3
Promptable Concept SegmentationYT-Temporal-1B (val)
cgF150.3
3
Promptable Concept SegmentationYT-Temporal-1B (test)
cgF151
3
Promptable Concept SegmentationSmartGlasses (val)
cgF133.6
3
Promptable Concept SegmentationSmartGlasses (test)
cgF136.5
3
Promptable Concept SegmentationSA-V (test)
cgF130.3
3
Showing 9 of 9 rows

Other info

Follow for update