Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking
About
In this study, we propose a novel RGB-T tracking framework by jointly modeling both appearance and motion cues. First, to obtain a robust appearance model, we develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities. The fusion weights are determined by using offline-trained global and local multimodal fusion networks, and then adopted to linearly combine the response maps of RGB and T modalities. Second, when the appearance cue is unreliable, we comprehensively take motion cues, i.e., target and camera motions, into account to make the tracker robust. We further propose a tracker switcher to switch the appearance and motion trackers flexibly. Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| RGB-T Tracking | RGBT234 (test) | Precision Rate79 | 189 | |
| RGB-T Tracking | GTOT | PR90.2 | 114 | |
| RGB-T Tracking | RGBT234 | Precision79 | 98 | |
| RGBT Tracking | RGBT234 | PR79 | 65 | |
| RGBT Tracking | RGBT 234 | Precision Rate79 | 53 | |
| RGB-Thermal tracking | RGBT234 (test) | MSR57.3 | 41 | |
| RGBT Tracking | LasHeR | PR46.7 | 41 | |
| RGBT Tracking | VOT-RGBT 2019 | EAO49.8 | 40 | |
| RGB-T Tracking | RGBT210 (test) | PR78.3 | 32 | |
| RGB-T Tracking | GTOT (test) | PR90.2 | 19 |