Tracking Anything with Decoupled Video Segmentation
About
Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: https://hkchengrex.github.io/Tracking-Anything-with-DEVA
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Object Segmentation | DAVIS 2017 (val) | J mean84.2 | 1130 | |
| Video Instance Segmentation | YouTube-VIS 2019 (val) | AP40.8 | 567 | |
| Video Object Segmentation | YouTube-VOS 2019 (val) | J-Score (Seen)85.4 | 231 | |
| Referring Video Object Segmentation | Ref-YouTube-VOS (val) | J&F Score66 | 200 | |
| Referring Video Object Segmentation | Ref-DAVIS 2017 (val) | J&F66.3 | 178 | |
| Unsupervised Video Object Segmentation | DAVIS 2016 (val) | -- | 108 | |
| Video Object Segmentation | SA-V (val) | J&F Score55.4 | 74 | |
| Video Object Segmentation | SA-V (test) | J&F56.2 | 70 | |
| Video Object Segmentation | MOSE (val) | J&F Score66 | 45 | |
| Semi-supervised Video Object Segmentation | DAVIS 2017 (val) | J&F Score87 | 31 |