Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning What to Learn for Video Object Segmentation

About

Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond standard few-shot learning techniques by learning what the few-shot learner should learn. This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach. We perform extensive experiments on multiple benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result.

Goutam Bhat, Felix J\"aremo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, Radu Timofte• 2020

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean79.1
1130
Video Object SegmentationYouTube-VOS 2018 (val)
J Score (Seen)80.4
493
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)84.4
460
Visual Object TrackingLaSOT (test)
AUC59.7
444
Video Object SegmentationYouTube-VOS 2019 (val)
J-Score (Seen)79.6
231
Visual Object TrackingVOT 2020 (test)
EAO0.472
147
Visual TrackingUAV123
AUC59.7
41
Video Object SegmentationLVOS v2 (val)
J&F60.6
41
Visual Object TrackingVOT ST 2020
Robustness0.798
23
Visual Object TrackingVOT 2022
EAO51.6
14
Showing 10 of 12 rows

Other info

Follow for update