MAST: A Memory-Augmented Self-supervised Tracker

About

Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.

Zihang Lai, Erika Lu, Weidi Xie• 2020

Related benchmarks

Task	Dataset	Result
Video Object Segmentation	DAVIS 2017 (val)	J mean63.3	1251
Video Object Segmentation	YouTube-VOS 2018 (val)	J Score (Seen)63.9	493
Video Object Segmentation	YouTube-VOS 2019 (val)	J-Score (Seen)64.3	240
Video Object Segmentation	DAVIS 2017 (test)	J (Jaccard Index)71	107
Video Object Segmentation	DAVIS 2017	Jaccard Index (J)71	82
Video Object Segmentation	YouTube-VOS (val)	--	81
Video label propagation	PerMIS Video	J&F Score65.1	7
Video label propagation	DAVIS 2017 (val)	J&F Score65.5	7

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord