Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

About

Recently Transformer has been largely explored in tracking and shown state-of-the-art (SOTA) performance. However, existing efforts mainly focus on fusing and enhancing features generated by convolutional neural networks (CNNs). The potential of Transformer in representation learning remains under-explored. In this paper, we aim to further unleash the power of Transformer by proposing a simple yet efficient fully-attentional tracker, dubbed SwinTrack, within classic Siamese framework. In particular, both representation learning and feature fusion in SwinTrack leverage the Transformer architecture, enabling better feature interactions for tracking than pure CNN or hybrid CNN-Transformer frameworks. Besides, to further enhance robustness, we present a novel motion token that embeds historical target trajectory to improve tracking by providing temporal context. Our motion token is lightweight with negligible computation but brings clear gains. In our thorough experiments, SwinTrack exceeds existing approaches on multiple benchmarks. Particularly, on the challenging LaSOT, SwinTrack sets a new record with 0.713 SUC score. It also achieves SOTA results on other benchmarks. We expect SwinTrack to serve as a solid baseline for Transformer tracking and facilitate future research. Our codes and results are released at https://github.com/LitingLin/SwinTrack.

Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, Haibin Ling• 2021

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)87
463
Visual Object TrackingLaSOT (test)
AUC71.3
446
Object TrackingLaSoT
AUC71.3
411
Visual Object TrackingGOT-10k (test)
Average Overlap72.4
408
Object TrackingTrackingNet
Precision (P)82.8
270
RGB-D Object TrackingVOT-RGBD 2022 (public challenge)
EAO0.708
263
Visual Object TrackingGOT-10k
AO72.4
254
Visual Object TrackingUAV123 (test)
AUC70.5
188
Visual Object TrackingTNL2K
AUC55.9
121
Visual Object TrackingLaSoText
AUC49.1
112
Showing 10 of 35 rows

Other info

Code

Follow for update