Transformer Tracking
About
Correlation acts as a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation itself is a local linear matching process, leading to lose semantic information and fall into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. Is there any better feature fusion method than correlation? To address this issue, inspired by Transformer, this work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. Finally, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Experiments show that our TransT achieves very promising results on six challenging datasets, especially on large-scale LaSOT, TrackingNet, and GOT-10k benchmarks. Our tracker runs at approximatively 50 fps on GPU. Code and models are available at https://github.com/chenxin-dlut/TransT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Object Tracking | TrackingNet (test) | Normalized Precision (Pnorm)86.8 | 460 | |
| Visual Object Tracking | LaSOT (test) | AUC64.9 | 444 | |
| Visual Object Tracking | GOT-10k (test) | Average Overlap72.3 | 378 | |
| Object Tracking | LaSoT | AUC64.9 | 333 | |
| RGB-T Tracking | LasHeR (test) | PR52.4 | 244 | |
| Object Tracking | TrackingNet | Precision (P)80.3 | 225 | |
| Visual Object Tracking | GOT-10k | AO76.8 | 223 | |
| RGB-T Tracking | RGBT234 (test) | Precision Rate82.7 | 189 | |
| Visual Object Tracking | UAV123 (test) | AUC69.1 | 188 | |
| Visual Object Tracking | UAV123 | AUC0.694 | 165 |