Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

About

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-$k$ filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory. We evaluate our method both qualitatively and quantitatively with different forms of user interactions (e.g., scribbles, clicks) on DAVIS to show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions, with the additional advantage in generalizing to different types of user interactions. We contribute a large-scale synthetic VOS dataset with pixel-accurate segmentation of 4.8M frames to accompany our source codes to facilitate future research.

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang• 2021

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean81.7
1130
Video Object SegmentationDAVIS 2016 (val)
J Mean89.7
564
Video Object SegmentationYouTube-VOS 2018 (val)
J Score (Seen)81.1
493
Video Object SegmentationDAVIS 2017 (test-dev)
Region J Mean74.9
237
Video Object SegmentationYouTube-VOS 2019 (val)
J-Score (Seen)81.6
231
Video Object SegmentationDAVIS 2017 (test)
J (Jaccard Index)74.9
107
Video Object SegmentationYouTube-VOS 2018
Score G92.4
47
Semi-supervised Video Object SegmentationDAVIS 2017 (val)
J&F Score84.5
31
Video Object SegmentationDAVIS 17
J Score81.7
25
Video Object SegmentationLong-time Video dataset (val)
J&F Score81.1
21
Showing 10 of 15 rows

Other info

Code

Follow for update