Fast Online Object Tracking and Segmentation: A Unifying Approach
About
In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. Our method, dubbed SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task. Once trained, SiamMask solely relies on a single bounding box initialisation and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second. Despite its simplicity, versatility and fast speed, our strategy allows us to establish a new state of the art among real-time trackers on VOT-2018, while at the same time demonstrating competitive performance and the best speed for the semi-supervised video object segmentation task on DAVIS-2016 and DAVIS-2017. The project website is http://www.robots.ox.ac.uk/~qwang/SiamMask.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Object Segmentation | DAVIS 2017 (val) | J mean59.5 | 1130 | |
| Video Object Segmentation | DAVIS 2016 (val) | J Mean71.7 | 564 | |
| Video Object Segmentation | YouTube-VOS 2018 (val) | J Score (Seen)60.2 | 493 | |
| Visual Object Tracking | LaSOT (test) | AUC46.7 | 444 | |
| Video Object Segmentation | DAVIS 2017 (test-dev) | Region J Mean40.6 | 237 | |
| Video Object Segmentation | YouTube-VOS 2019 (val) | J-Score (Seen)60.2 | 231 | |
| Visual Object Tracking | VOT 2020 (test) | EAO0.321 | 147 | |
| Video Object Segmentation | DAVIS 2017 (test) | J (Jaccard Index)54.3 | 107 | |
| Video Object Segmentation | YouTube-VOS (val) | J Score (Seen)60.2 | 81 | |
| Visual Object Tracking | VOT 2016 | EAO44.2 | 79 |