Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hierarchical Memory Matching Network for Video Object Segmentation

About

We present Hierarchical Memory Matching Network (HMMN) for semi-supervised video object segmentation. Based on a recent memory-based method [33], we propose two advanced memory read modules that enable us to perform memory reading in multiple scales while exploiting temporal smoothness. We first propose a kernel guided memory matching module that replaces the non-local dense memory read, commonly adopted in previous memory-based methods. The module imposes the temporal smoothness constraint in the memory read, leading to accurate memory retrieval. More importantly, we introduce a hierarchical memory matching scheme and propose a top-k guided memory matching module in which memory read on a fine-scale is guided by that on a coarse-scale. With the module, we perform memory read in multiple scales efficiently and leverage both high-level semantic and low-level fine-grained memory features to predict detailed object masks. Our network achieves state-of-the-art performance on the validation sets of DAVIS 2016/2017 (90.8% and 84.7%) and YouTube-VOS 2018/2019 (82.6% and 82.5%), and test-dev set of DAVIS 2017 (78.6%). The source code and model are available online: https://github.com/Hongje/HMMN.

Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim• 2021

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean81.9
1130
Video Object SegmentationDAVIS 2016 (val)
J Mean89.6
564
Video Object SegmentationYouTube-VOS 2018 (val)
J Score (Seen)82.1
493
Video Object SegmentationDAVIS 2017 (test-dev)
Region J Mean74.7
237
Video Object SegmentationYouTube-VOS 2019 (val)
J-Score (Seen)81.7
231
Video Object SegmentationDAVIS 2017 (test)
J (Jaccard Index)74.7
107
Mask PredictionYoutube-VOS
BCE Loss1.567
5
Mask PredictionDAVIS
BCE Loss3.738
5
Showing 8 of 8 rows

Other info

Code

Follow for update