Generalized Relation Modeling for Transformer Tracking
About
Compared with previous two-stream trackers, the recent one-stream tracking pipeline, which allows earlier interaction between the template and search region, has achieved a remarkable performance gain. However, existing one-stream trackers always let the template interact with all parts inside the search region throughout all the encoder layers. This could potentially lead to target-background confusion when the extracted feature representations are not sufficiently discriminative. To alleviate this issue, we propose a generalized relation modeling method based on adaptive token division. The proposed method is a generalized formulation of attention-based relation modeling for Transformer tracking, which inherits the merits of both previous two-stream and one-stream pipelines whilst enabling more flexible relation modeling by selecting appropriate search tokens to interact with template tokens. An attention masking strategy and the Gumbel-Softmax technique are introduced to facilitate the parallel computation and end-to-end learning of the token division module. Extensive experiments show that our method is superior to the two-stream and one-stream pipelines and achieves state-of-the-art performance on six challenging benchmarks with a real-time running speed.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Object Tracking | TrackingNet (test) | Normalized Precision (Pnorm)88.9 | 460 | |
| Visual Object Tracking | LaSOT (test) | AUC71.4 | 444 | |
| Visual Object Tracking | GOT-10k (test) | Average Overlap73.4 | 378 | |
| Object Tracking | LaSoT | AUC71.4 | 333 | |
| Object Tracking | TrackingNet | Precision (P)84 | 225 | |
| Visual Object Tracking | GOT-10k | AO73.4 | 223 | |
| Visual Object Tracking | UAV123 (test) | AUC70.2 | 188 | |
| Visual Object Tracking | NFS (Need for Speed) 30 FPS (test) | AUC65.6 | 54 | |
| Visual Object Tracking | GOT-10k 1.0 (test) | AO73.4 | 51 | |
| Visual Object Tracking | LaSoT | AUC69.9 | 44 |