Rethinking the competition between detection and ReID in Multi-Object Tracking
About
Due to balanced accuracy and speed, one-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT). However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm. This leads to inferior performance compared with existing two-stage methods. In this paper, we first dissect the reasoning process for these two tasks, which reveals that the competition between them inevitably would destroy task-dependent representations learning. To tackle this problem, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design so that to impel each branch to better learn task-dependent representations. The proposed model aims to alleviate the deleterious tasks competition, meanwhile improve the cooperation between detection and ReID. Furthermore, we introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings. By integrating the two delicately designed networks into a one-shot online MOT system, we construct a strong MOT tracker, namely CSTrack. Our tracker achieves the state-of-the-art performance on MOT16, MOT17 and MOT20 datasets, without other bells and whistles. Moreover, CSTrack is efficient and runs at 16.4 FPS on a single modern GPU, and its lightweight version even runs at 34.6 FPS. The complete code has been released at https://github.com/JudasDie/SOTS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multiple Object Tracking | MOT17 (test) | MOTA74.9 | 921 | |
| Multiple Object Tracking | MOT20 (test) | MOTA68.6 | 358 | |
| Multi-Object Tracking | MOT16 (test) | MOTA75.6 | 228 | |
| Multi-Object Tracking | MOT 2016 (test) | MOTA75.6 | 59 | |
| Multi-Object Tracking | MOT17 1.0 (test) | MOTA74.9 | 48 | |
| Multi-Object Tracking | MOT 2020 (test) | MOTA66.6 | 44 | |
| Multi-Object Tracking | BFT 1.0 (test) | Detection Accuracy47 | 37 | |
| Multi-Object Tracking | MOT 2017 (test) | MOTA74.9 | 34 | |
| Multi-Object Tracking | Human in Events (HiEve) (test) | MOTA48.6 | 26 | |
| Multi-Object Tracking | KITTI Cars (test) | MOTA87.3 | 20 |