Segment Any Motion in Videos

About

Moving object segmentation is a crucial task for achieving a high-level understanding of visual scenes and has numerous downstream applications. Humans can effortlessly segment moving objects in videos. Previous work has largely relied on optical flow to provide motion cues; however, this approach often results in imperfect predictions due to challenges such as partial motion, complex deformations, motion blur and background distractions. We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features and leverages SAM2 for pixel-level mask densification through an iterative prompting strategy. Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support. Extensive testing on diverse datasets demonstrates state-of-the-art performance, excelling in challenging scenarios and fine-grained segmentation of multiple objects. Our code is available at https://motion-seg.github.io/.

Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, Qianqian Wang• 2025

Related benchmarks

Task	Dataset	Result
Video Object Segmentation	DAVIS 2016	J-Measure81.9	50
Video Object Segmentation	SegTrack v2	IoU (J)76.3	34
Unsupervised Video Object Segmentation	DAVIS 2016	Jaccard Score90.6	32
Moving Object Segmentation	DAVIS Moving 2016	Jaccard Index90.6	26
Novel View Synthesis	D-RE10K static regions only (test)	PSNR20.73	26
Novel View Synthesis	D-RE10K-iPhone full-image fidelity (test)	PSNR20.01	26
Video Object Segmentation	DAVIS 17	J Score90	25
Video Object Segmentation	FBMS	J-Score78.3	25
Moving Object Segmentation	FBMS-59	J-Measure78.3	20
Moving Object Segmentation	DAVIS M 17	Jaccard Index (J)90	12

Showing 10 of 23 rows

Other info

Code

Follow for update

@wizwand_team Discord