Prompting for Multi-Modal Tracking

About

Multi-modal tracking gains attention due to its ability to be more accurate and robust in complex scenarios compared to traditional RGB-based tracking. Its key lies in how to fuse multi-modal data and reduce the gap between modalities. However, multi-modal tracking still severely suffers from data deficiency, thus resulting in the insufficient learning of fusion modules. Instead of building such a fusion module, in this paper, we provide a new perspective on multi-modal tracking by attaching importance to the multi-modal visual prompts. We design a novel multi-modal prompt tracker (ProTrack), which can transfer the multi-modal inputs to a single modality by the prompt paradigm. By best employing the tracking ability of pre-trained RGB trackers learning at scale, our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data. Extensive experiments on 5 benchmark datasets demonstrate the effectiveness of the proposed ProTrack.

Jinyu Yang, Zhe Li, Feng Zheng, Ale\v{s} Leonardis, Jingkuan Song• 2022

Related benchmarks

Task	Dataset	Result
RGB-D Object Tracking	VOT-RGBD 2022 (public challenge)	EAO65.1	263
RGB-T Tracking	LasHeR (test)	PR53.8	257
RGB-T Tracking	RGBT234 (test)	Precision Rate79.5	203
RGB-D Object Tracking	DepthTrack (test)	Precision58.3	181
RGB-T Tracking	RGBT234	Precision79.5	121
RGBT Tracking	LasHeR	PR53.8	120
RGBT Tracking	RGBT234	PR79.5	112
Visual Object Tracking	DepthTrack	Recall0.573	106
Object Tracking	VisEvent (test)	PR63.2	63
RGBT Tracking	LasHeR	PR53.8	62

Showing 10 of 33 rows

Other info

Follow for update

@wizwand_team Discord