FARTrack: Fast Autoregressive Visual Tracking with High Performance
About
Inference speed and tracking performance are two critical evaluation metrics in the field of visual tracking. However, high-performance trackers often suffer from slow processing speeds, making them impractical for deployment on resource-constrained devices. To alleviate this issue, we propose FARTrack, a Fast Auto-Regressive Tracking framework. Since autoregression emphasizes the temporal nature of the trajectory sequence, it can maintain high performance while achieving efficient execution across various devices. FARTrack introduces Task-Specific Self-Distillation and Inter-frame Autoregressive Sparsification, designed from the perspectives of shallow-yet-accurate distillation and redundant-to-essential token optimization, respectively. Task-Specific Self-Distillation achieves model compression by distilling task-specific tokens layer by layer, enhancing the model's inference speed while avoiding suboptimal manual teacher-student layer pairs assignments. Meanwhile, Inter-frame Autoregressive Sparsification sequentially condenses multiple templates, avoiding additional runtime overhead while learning a temporally-global optimal sparsification strategy. FARTrack demonstrates outstanding speed and competitive performance. It delivers an AO of 70.6% on GOT-10k in real-time. Beyond, our fastest model achieves a speed of 343 FPS on the GPU and 121 FPS on the CPU.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Tracking | LaSoT | AUC63.2 | 333 | |
| Object Tracking | TrackingNet | Precision (P)77.5 | 225 | |
| Visual Object Tracking | UAV123 (test) | AUC65.8 | 188 | |
| Visual Object Tracking | LaSOText (test) | AUC45 | 85 | |
| Object Tracking | GOT-10k | AO70.6 | 74 | |
| Visual Object Tracking | NFS (Need for Speed) 30 FPS (test) | AUC66.9 | 54 | |
| Visual Tracking | VastTrack (test) | AUC35.2 | 5 |