Explicit Visual Prompts for Visual Object Tracking
About
How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template updating strategy, while lacking the exploitation of context between consecutive frames and thus entailing the \textit{when-and-how-to-update} dilemma. To address these issues, we propose a novel explicit visual prompts framework for visual tracking, dubbed \textbf{EVPTrack}. Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates. As a result, we cannot only alleviate the challenge of \textit{when-to-update}, but also avoid the hyper-parameters associated with updating strategies. Then, we utilize the spatio-temporal tokens to generate explicit visual prompts that facilitate inference in the current frame. The prompts are fed into a transformer encoder together with the image tokens without additional processing. Consequently, the efficiency of our model is improved by avoiding \textit{how-to-update}. In addition, we consider multi-scale information as explicit visual prompts, providing multiscale template features to enhance the EVPTrack's ability to handle target scale changes. Extensive experimental results on six benchmarks (i.e., LaSOT, LaSOT\rm $_{ext}$, GOT-10k, UAV123, TrackingNet, and TNL2K.) validate that our EVPTrack can achieve competitive performance at a real-time speed by effectively exploiting both spatio-temporal and multi-scale information. Code and models are available at https://github.com/GXNU-ZhongLab/EVPTrack.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Object Tracking | TrackingNet (test) | Normalized Precision (Pnorm)88.3 | 460 | |
| Visual Object Tracking | LaSOT (test) | AUC70.4 | 444 | |
| Visual Object Tracking | GOT-10k (test) | Average Overlap73.3 | 378 | |
| Object Tracking | LaSoT | AUC70.4 | 333 | |
| Object Tracking | TrackingNet | -- | 225 | |
| Visual Object Tracking | GOT-10k | AO73.3 | 223 | |
| Visual Object Tracking | LaSOText (test) | AUC48.7 | 85 | |
| Visual Object Tracking | GOT-10k 1.0 (test) | AO76.6 | 51 | |
| Visual Object Tracking | LaSOT 1.0 (test) | AUC72.7 | 42 | |
| Visual Object Tracking | LaSOT 42 (test) | Success Rate72.7 | 34 |