Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Improving Visual Object Tracking through Visual Prompting

About

Learning a discriminative model that distinguishes the specified target from surrounding distractors across frames is essential for generic object tracking (GOT). Dynamic adaptation of target representation against distractors remains challenging because prevailing trackers exhibit limited discriminative capability. To address this issue, we present a new visual prompting mechanism for generic object tracking, termed PiVOT. PiVOT introduces mechanisms that leverage the pretrained foundation model (CLIP) to automatically generate and refine visual prompts online, thereby enabling the tracker to suppress distractors through contrastive guidance. To transfer contrastive knowledge from the foundation model to the tracker, PiVOT automatically propagates this knowledge online and dynamically generates and updates visual prompts. Specifically, it proposes a prompt initialization mechanism that produces an initial visual prompt highlighting potential target locations. The foundation model is then used to refine the prompt based on appearance similarities between candidate objects and reference templates across potential targets. After refinement, the visual prompt better highlights potential target locations and reduces irrelevant prompt information. With the proposed prompting mechanism, the tracker can generate instance-aware feature maps guided by the visual prompts, which are incrementally and automatically updated during tracking, thereby effectively suppressing distractors. Extensive experiments across multiple benchmarks indicate that PiVOT, with the proposed prompting mechanism, can suppress distracting objects and improve tracking performance.

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin• 2024

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)90
463
Object TrackingLaSoT--
411
Visual Object TrackingGOT-10k (test)
Average Overlap76.9
408
Object TrackingTrackingNet--
270
Visual Object TrackingGOT-10k
AO76.9
254
Visual Object TrackingUAV123 (test)--
188
Visual Object TrackingOTB100 (test)--
41
Visual Object TrackingAVisT (test)
AUC62.2
35
Visual Object TrackingLaSOT 42 (test)
Success Rate73.4
34
Visual TrackingAVisT--
33
Showing 10 of 15 rows

Other info

Code

Follow for update