Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks

About

Siamese network based trackers formulate tracking as convolutional feature cross-correlation between target template and searching region. However, Siamese trackers still have accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of feature from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on four large tracking benchmarks, including OTB2015, VOT2018, UAV123, and LaSOT. Our model will be released to facilitate further studies based on this problem.

Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, Junjie Yan• 2018

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean56.8
1130
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)80
460
Visual Object TrackingLaSOT (test)
AUC49.6
444
Visual Object TrackingGOT-10k (test)
Average Overlap51.8
378
Object TrackingLaSoT
AUC49.6
333
Object TrackingTrackingNet
Precision (P)69.4
225
Visual Object TrackingGOT-10k
AO61.6
223
Visual Object TrackingUAV123 (test)
AUC64.2
188
Visual Object TrackingUAV123
AUC0.613
165
Visual Object TrackingOTB-100
AUC69.6
136
Showing 10 of 82 rows
...

Other info

Code

Follow for update