A Twofold Siamese Network for Real-Time Object Tracking
About
Observing that Semantic features learned in an image classification task and Appearance features learned in a similarity matching task complement each other, we build a twofold Siamese network, named SA-Siam, for real-time object tracking. SA-Siam is composed of a semantic branch and an appearance branch. Each branch is a similarity-learning Siamese network. An important design choice in SA-Siam is to separately train the two branches to keep the heterogeneity of the two types of features. In addition, we propose a channel attention mechanism for the semantic branch. Channel-wise weights are computed according to the channel activations around the target position. While the inherited architecture from SiamFC \cite{SiamFC} allows our tracker to operate beyond real-time, the twofold design and the attention mechanism significantly improve the tracking performance. The proposed SA-Siam outperforms all other real-time trackers by a large margin on OTB-2013/50/100 benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Object Tracking | OTB-100 | AUC65.7 | 136 | |
| Visual Object Tracking | VOT 2016 | EAO29.1 | 79 | |
| Visual Tracking | VOT 2016 (test) | EAO0.2911 | 70 | |
| Object Tracking | OTB 2015 (test) | AUC0.656 | 63 | |
| Visual Object Tracking | VOT 2015 | EAO0.31 | 61 | |
| Visual Object Tracking | OTB 2013 | AUC67.7 | 60 | |
| Visual Object Tracking | OTB 2015 | AUC65.7 | 58 | |
| Visual Object Tracking | OTB100 (test) | AUC0.657 | 41 | |
| Visual Tracking | VOT 2015 (test) | Accuracy59 | 20 | |
| Visual Object Tracking | OTB-50 | AUC0.61 | 20 |