Siam R-CNN: Visual Tracking by Re-Detection

About

We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and potential distractor objects. This enables our approach to make better tracking decisions, as well as to re-detect tracked objects after long occlusion. Finally, we propose a novel hard example mining strategy to improve Siam R-CNN's robustness to similar looking objects. Siam R-CNN achieves the current best performance on ten tracking benchmarks, with especially strong results for long-term tracking. We make our code and models available at www.vision.rwth-aachen.de/page/siamrcnn.

Paul Voigtlaender, Jonathon Luiten, Philip H.S. Torr, Bastian Leibe• 2019

Related benchmarks

Task	Dataset	Result
Video Object Segmentation	DAVIS 2017 (val)	J mean69.3	1226
Video Object Segmentation	DAVIS 2016 (val)	--	564
Visual Object Tracking	TrackingNet (test)	Normalized Precision (Pnorm)85.4	502
Object Tracking	LaSoT	AUC64.8	498
Video Object Segmentation	YouTube-VOS 2018 (val)	J Score (Seen)73.5	493
Visual Object Tracking	LaSOT (test)	AUC64.8	470
Visual Object Tracking	GOT-10k (test)	Average Overlap64.9	450
Object Tracking	TrackingNet	Precision (P)80	327
Visual Object Tracking	GOT-10k	AO64.9	306
Video Object Segmentation	YouTube-VOS 2019 (val)	J-Score (Seen)68.1	240

Showing 10 of 70 rows

Other info

Code

Follow for update

@wizwand_team Discord