Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TubeFormer-DeepLab: Video Mask Transformer

About

We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner. Different video segmentation tasks (e.g., video semantic/instance/panoptic segmentation) are usually considered as distinct problems. State-of-the-art models adopted in the separate communities have diverged, and radically different approaches dominate in each task. By contrast, we make a crucial observation that video segmentation tasks could be generally formulated as the problem of assigning different predicted labels to video tubes (where a tube is obtained by linking segmentation masks along the time axis) and the labels may encode different values depending on the target task. The observation motivates us to develop TubeFormer-DeepLab, a simple and effective video mask transformer model that is widely applicable to multiple video segmentation tasks. TubeFormer-DeepLab directly predicts video tubes with task-specific labels (either pure semantic categories, or both semantic categories and instance identities), which not only significantly simplifies video segmentation models, but also advances state-of-the-art results on multiple video segmentation benchmarks

Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen• 2022

Related benchmarks

TaskDatasetResultRank
Video Instance SegmentationYouTube-VIS 2019 (val)
AP47.5
567
Video Instance SegmentationYouTube-VIS 2021 (val)
AP41.2
344
Video Semantic SegmentationVSPW (val)
mIoU63.2
92
Video Instance SegmentationYouTube-VIS 2019
AP47.5
75
Video Panoptic SegmentationVIPSeg (val)
VPQ31.2
73
Video Instance SegmentationYouTube-VIS 2021
AP41.2
63
Video Semantic SegmentationVSPW
mIoU63.2
25
Video Panoptic SegmentationKITTI-STEP (val)
STQ70
22
Video Panoptic SegmentationKITTI-STEP (test)
STQ65.25
15
Video Panoptic SegmentationVIPSeg (test)
STQ38.6
15
Showing 10 of 13 rows

Other info

Follow for update