Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

About

We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being $3.6 \times$ faster than ARTrack. The code will be released.

Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei• 2023

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)90.4
463
Visual Object TrackingLaSOT (test)
AUC73.6
446
Object TrackingLaSoT
AUC73.6
411
Visual Object TrackingGOT-10k (test)
Average Overlap79.5
408
Object TrackingTrackingNet
Precision (P)86.2
270
Visual Object TrackingGOT-10k
AO79.5
254
Visual Object TrackingUAV123 (test)
AUC69.9
188
Visual Object TrackingUAV123
AUC0.717
172
Visual Object TrackingTNL2K
AUC61.6
121
Visual Object TrackingNfS
AUC0.684
112
Showing 10 of 18 rows

Other info

Code

Follow for update