Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Recurrent Dynamic Embedding for Video Object Segmentation

About

Space-time memory (STM) based video object segmentation (VOS) networks usually keep increasing memory bank every several frames, which shows excellent performance. However, 1) the hardware cannot withstand the ever-increasing memory requirements as the video length increases. 2) Storing lots of information inevitably introduces lots of noise, which is not conducive to reading the most important information from the memory bank. In this paper, we propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. Specifically, we explicitly generate and update RDE by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information. To avoid error accumulation owing to the recurrent usage of SAM, we propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. Moreover, the predicted masks in the memory bank are inaccurate due to the inaccurate network inference, which affects the segmentation of the query frame. To address this problem, we design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank. Extensive experiments show our method achieves the best tradeoff between performance and speed. Code is available at https://github.com/Limingxing00/RDE-VOS-CVPR2022.

Mingxing Li, Li Hu, Zhiwei Xiong, Bang Zhang, Pan Pan, Dong Liu• 2022

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean82.1
1130
Video Object SegmentationDAVIS 2016 (val)
J Mean90
564
Video Object SegmentationYouTube-VOS 2019 (val)
J-Score (Seen)81.9
231
Video Object SegmentationDAVIS 2017 (test)
J (Jaccard Index)74.9
107
Video Object SegmentationSA-V (val)
J&F Score51.8
74
Video Object SegmentationSA-V (test)
J&F53.9
70
Video Object SegmentationYouTube-VOS 2018
Score G83.3
47
Video Object SegmentationMOSE (val)
J&F Score46.8
45
Video Object SegmentationDAVIS 2017
Jaccard Index (J)82.1
42
Video Object SegmentationLVOS v2 (val)
J&F62.2
41
Showing 10 of 25 rows

Other info

Code

Follow for update