Segment Any Motion in Videos
About
Moving object segmentation is a crucial task for achieving a high-level understanding of visual scenes and has numerous downstream applications. Humans can effortlessly segment moving objects in videos. Previous work has largely relied on optical flow to provide motion cues; however, this approach often results in imperfect predictions due to challenges such as partial motion, complex deformations, motion blur and background distractions. We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features and leverages SAM2 for pixel-level mask densification through an iterative prompting strategy. Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support. Extensive testing on diverse datasets demonstrates state-of-the-art performance, excelling in challenging scenarios and fine-grained segmentation of multiple objects. Our code is available at https://motion-seg.github.io/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Object Segmentation | DAVIS 2016 | J-Measure81.9 | 44 | |
| Video Object Segmentation | SegTrack v2 | IoU (J)76.3 | 34 | |
| Moving Object Segmentation | DAVIS Moving 2016 | Jaccard Index90.6 | 26 | |
| Novel View Synthesis | D-RE10K static regions only (test) | PSNR20.73 | 26 | |
| Novel View Synthesis | D-RE10K-iPhone full-image fidelity (test) | PSNR20.01 | 26 | |
| Video Object Segmentation | DAVIS 17 | J Score90 | 25 | |
| Moving Object Segmentation | FBMS-59 | J-Measure78.3 | 20 | |
| Motion Segmentation | D-RE10K | mIoU50.9 | 11 | |
| Fine-grained Moving Object Segmentation | DAVIS Moving 17 | J & F Score80.5 | 4 |