Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Motion Prompt Learning for Robust Visual Tracking

About

Due to the challenges of processing temporal information, most trackers depend solely on visual discriminability and overlook the unique temporal coherence of video data. In this paper, we propose a lightweight and plug-and-play motion prompt tracking method. It can be easily integrated into existing vision-based trackers to build a joint tracking framework leveraging both motion and vision cues, thereby achieving robust tracking through efficient prompt learning. A motion encoder with three different positional encodings is proposed to encode the long-term motion trajectory into the visual embedding space, while a fusion decoder and an adaptive weight mechanism are designed to dynamically fuse visual and motion features. We integrate our motion module into three different trackers with five models in total. Experiments on seven challenging tracking benchmarks demonstrate that the proposed motion module significantly improves the robustness of vision-based trackers, with minimal training costs and negligible speed sacrifice. Code is available at https://github.com/zj5559/Motion-Prompt-Tracking.

Jie Zhao, Xin Chen, Yongsheng Yuan, Michael Felsberg, Dong Wang, Huchuan Lu• 2025

Related benchmarks

TaskDatasetResultRank
Object TrackingLaSoT
AUC73.9
411
Object TrackingTrackingNet
Precision (P)86.2
270
Visual Object TrackingTNL2K
AUC60.4
121
Single Object TrackingLaSOT ext
AUC52.8
42
Visual TrackingVOT 2018
EAO0.469
24
Visual TrackingVOT STB 2022
EAO57.9
17
Visual TrackingVOT 2020
EAO0.341
15
Showing 7 of 7 rows

Other info

Follow for update