Improving Visual Object Tracking through Visual Prompting

About

Learning a discriminative model that distinguishes the specified target from surrounding distractors across frames is essential for generic object tracking (GOT). Dynamic adaptation of target representation against distractors remains challenging because prevailing trackers exhibit limited discriminative capability. To address this issue, we present a new visual prompting mechanism for generic object tracking, termed PiVOT. PiVOT introduces mechanisms that leverage the pretrained foundation model (CLIP) to automatically generate and refine visual prompts online, thereby enabling the tracker to suppress distractors through contrastive guidance. To transfer contrastive knowledge from the foundation model to the tracker, PiVOT automatically propagates this knowledge online and dynamically generates and updates visual prompts. Specifically, it proposes a prompt initialization mechanism that produces an initial visual prompt highlighting potential target locations. The foundation model is then used to refine the prompt based on appearance similarities between candidate objects and reference templates across potential targets. After refinement, the visual prompt better highlights potential target locations and reduces irrelevant prompt information. With the proposed prompting mechanism, the tracker can generate instance-aware feature maps guided by the visual prompts, which are incrementally and automatically updated during tracking, thereby effectively suppressing distractors. Extensive experiments across multiple benchmarks indicate that PiVOT, with the proposed prompting mechanism, can suppress distracting objects and improve tracking performance.

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin• 2024

Related benchmarks

Task	Dataset	Result
Visual Object Tracking	TrackingNet (test)	Normalized Precision (Pnorm)90	502
Object Tracking	LaSoT	--	498
Visual Object Tracking	GOT-10k (test)	Average Overlap76.9	450
Object Tracking	TrackingNet	--	327
Visual Object Tracking	GOT-10k	AO76.9	306
Visual Object Tracking	UAV123 (test)	--	188
Visual Object Tracking	OTB100 (test)	--	41
Visual Object Tracking	AVisT (test)	AUC62.2	35
Visual Object Tracking	LaSOT 42 (test)	Success Rate73.4	34
Visual Tracking	AVisT	--	33

Showing 10 of 15 rows

Other info

Code

Follow for update

@wizwand_team Discord