Towards Sequence-Level Training for Visual Tracking
About
Despite the extensive adoption of machine learning on the task of visual object tracking, recent learning-based approaches have largely overlooked the fact that visual tracking is a sequence-level task in its nature; they rely heavily on frame-level training, which inevitably induces inconsistency between training and testing in terms of both data distributions and task objectives. This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning and discusses how a sequence-level design of data sampling, learning objectives, and data augmentation can improve the accuracy and robustness of tracking algorithms. Our experiments on standard benchmarks including LaSOT, TrackingNet, and GOT-10k demonstrate that four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training without modifying architectures.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Object Tracking | TrackingNet (test) | Normalized Precision (Pnorm)87.5 | 460 | |
| Visual Object Tracking | LaSOT (test) | AUC66.8 | 444 | |
| Visual Object Tracking | GOT-10k (test) | Average Overlap72.5 | 378 | |
| Visual Object Tracking | LaSOT 2019 (test) | AUC66.8 | 31 | |
| Visual Object Tracking | GOT-10k Restricted Protocol (test) | AO67.5 | 23 |