Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tracking Anything with Decoupled Video Segmentation

About

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: https://hkchengrex.github.io/Tracking-Anything-with-DEVA

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee• 2023

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean84.2
1193
Video Instance SegmentationYouTube-VIS 2019 (val)
AP40.8
604
Referring Video Object SegmentationRef-YouTube-VOS (val)
J&F Score66
244
Video Object SegmentationYouTube-VOS 2019 (val)
J-Score (Seen)85.4
231
Referring Video Object SegmentationRef-DAVIS 2017 (val)
J&F66.3
205
Video Object SegmentationSA-V (val)
J&F Score55.4
114
Video Object SegmentationSA-V (test)
J&F56.2
110
Unsupervised Video Object SegmentationDAVIS 2016 (val)--
108
Video Object SegmentationMOSE (val)
J&F Score66
45
Referring Video Object SegmentationYoURVOS (test)
J&F21.9
40
Showing 10 of 38 rows

Other info

Code

Follow for update