SiamVGG: Visual Tracking using Deeper Siamese Networks
About
Recently, we have seen a rapid development of Deep Neural Network (DNN) based visual tracking solutions. Some trackers combine the DNN-based solutions with Discriminative Correlation Filters (DCF) to extract semantic features and successfully deliver the state-of-the-art tracking accuracy. However, these solutions are highly compute-intensive, which require long processing time, resulting unsecured real-time performance. To deliver both high accuracy and reliable real-time performance, we propose a novel tracker called SiamVGG\footnote{https://github.com/leeyeehoo/SiamVGG}. It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking. The architecture of SiamVGG is customized from VGG-16 with the parameters shared by both exemplary images and desired input video frames. We demonstrate the proposed SiamVGG on OTB-2013/50/100 and VOT 2015/2016/2017 datasets with the state-of-the-art accuracy while maintaining a decent real-time performance of 50 FPS running on a GTX 1080Ti. Our design can achieve 2% higher Expected Average Overlap (EAO) compared to the ECO and C-COT in VOT2017 Challenge.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Object Tracking | OTB-100 | AUC65.4 | 136 | |
| Visual Tracking | VOT 2016 (test) | EAO0.351 | 70 | |
| Visual Object Tracking | VOT 2015 | EAO0.373 | 61 | |
| Visual Object Tracking | OTB-50 | AUC0.61 | 20 | |
| Visual Object Tracking | OTB 2013 | AUC66.5 | 17 | |
| Visual Object Tracking | VOT real-time challenge 2017 toolkit 6.0.3 | EAO0.275 | 14 | |
| Visual Object Tracking | VOT 2017 6.0.3 (test) | EAO0.286 | 14 |