SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
About
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the modality gap limits pre-trained knowledge recall, and the dominance of the RGB modality persists, preventing the full utilization of information from other modalities. To address these issues, we propose a novel symmetric multimodal tracking framework called SDSTrack. We introduce lightweight adaptation for efficient fine-tuning, which directly transfers the feature extraction ability from RGB to other domains with a small number of trainable parameters and integrates multimodal features in a balanced, symmetric manner. Furthermore, we design a complementary masked patch distillation strategy to enhance the robustness of trackers in complex environments, such as extreme weather, poor imaging, and sensor failure. Extensive experiments demonstrate that SDSTrack outperforms state-of-the-art methods in various multimodal tracking scenarios, including RGB+Depth, RGB+Thermal, and RGB+Event tracking, and exhibits impressive results in extreme conditions. Our source code is available at https://github.com/hoqolo/SDSTrack.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| RGB-T Tracking | LasHeR (test) | PR66.7 | 244 | |
| RGB-T Tracking | RGBT234 (test) | Precision Rate84.8 | 189 | |
| RGB-D Object Tracking | VOT-RGBD 2022 (public challenge) | EAO72.8 | 167 | |
| RGB-D Object Tracking | DepthTrack (test) | Precision61.9 | 145 | |
| RGB-T Tracking | RGBT234 | Precision84.8 | 98 | |
| RGBT Tracking | RGBT234 | PR84.8 | 65 | |
| Object Tracking | VisEvent (test) | PR76.7 | 63 | |
| RGBT Tracking | LasHeR | PR66.5 | 55 | |
| RGBT Tracking | RGBT 234 | Precision Rate84.8 | 53 | |
| Object Tracking | COESOT (test) | SR66.7 | 50 |