Efficient Motion Prompt Learning for Robust Visual Tracking

About

Due to the challenges of processing temporal information, most trackers depend solely on visual discriminability and overlook the unique temporal coherence of video data. In this paper, we propose a lightweight and plug-and-play motion prompt tracking method. It can be easily integrated into existing vision-based trackers to build a joint tracking framework leveraging both motion and vision cues, thereby achieving robust tracking through efficient prompt learning. A motion encoder with three different positional encodings is proposed to encode the long-term motion trajectory into the visual embedding space, while a fusion decoder and an adaptive weight mechanism are designed to dynamically fuse visual and motion features. We integrate our motion module into three different trackers with five models in total. Experiments on seven challenging tracking benchmarks demonstrate that the proposed motion module significantly improves the robustness of vision-based trackers, with minimal training costs and negligible speed sacrifice. Code is available at https://github.com/zj5559/Motion-Prompt-Tracking.

Jie Zhao, Xin Chen, Yongsheng Yuan, Michael Felsberg, Dong Wang, Huchuan Lu• 2025

Related benchmarks

Task	Dataset	Result
Object Tracking	LaSoT	AUC73.9	498
Object Tracking	TrackingNet	Precision (P)86.2	327
Visual Object Tracking	TNL2K	AUC60.4	169
Single Object Tracking	LaSOT ext	AUC52.8	42
Visual Tracking	VOT 2018	EAO0.469	24
Visual Tracking	VOT STB 2022	EAO57.9	17
Visual Tracking	VOT 2020	EAO0.341	15

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord