Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Transformer Tracking with Cyclic Shifting Window Attention

About

Transformer architecture has been showing its great strength in visual object tracking, for its effective attention mechanism. Existing transformer-based approaches adopt the pixel-to-pixel attention strategy on flattened image features and unavoidably ignore the integrity of objects. In this paper, we propose a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level. The cross-window multi-scale attention has the advantage of aggregating attention at different scales and generates the best fine-scale match for the target object. Furthermore, the cyclic shifting strategy brings greater accuracy by expanding the window samples with positional information, and at the same time saves huge amounts of computational power by removing redundant calculations. Extensive experiments demonstrate the superior performance of our method, which also sets the new state-of-the-art records on five challenging datasets, along with the VOT2020, UAV123, LaSOT, TrackingNet, and GOT-10k benchmarks.

Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang• 2022

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)86.7
460
Visual Object TrackingLaSOT (test)
AUC66.2
444
Visual Object TrackingGOT-10k (test)
Average Overlap69.4
378
Object TrackingTrackingNet
Precision (P)79.5
225
Visual Object TrackingUAV123 (test)
AUC70.5
188
Visual Object TrackingVOT 2020 (test)
EAO0.304
147
Object TrackingGOT-10k
AO69.4
74
Visual Object TrackingOTB100 (test)--
41
Visual Object TrackingLaSOT 42 (test)
Success Rate66.2
34
Visual Object TrackingLaSOT 2019 (test)
AUC66.2
31
Showing 10 of 12 rows

Other info

Code

Follow for update