Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

About

We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being $3.6 \times$ faster than ARTrack. The code will be released.

Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei• 2023

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)90.4
460
Visual Object TrackingLaSOT (test)
AUC73.6
444
Visual Object TrackingGOT-10k (test)
Average Overlap79.5
378
Object TrackingLaSoT
AUC73.6
333
Object TrackingTrackingNet
Precision (P)86.2
225
Visual Object TrackingGOT-10k
AO79.5
223
Visual Object TrackingUAV123 (test)
AUC69.9
188
Visual Object TrackingUAV123
AUC0.717
165
Visual Object TrackingNfS
AUC0.684
112
Visual Object TrackingLaSOText (test)
AUC53.4
85
Showing 10 of 15 rows

Other info

Code

Follow for update