Tracking by Instance Detection: A Meta-Learning Approach
About
We consider the tracking problem as a special type of object detection problem, which we call instance detection. With proper initialization, a detector can be quickly converted into a tracker by learning the new instance from a single image. We find that model-agnostic meta-learning (MAML) offers a strategy to initialize the detector that satisfies our needs. We propose a principled three-step approach to build a high-performance tracker. First, pick any modern object detector trained with gradient descent. Second, conduct offline training (or initialization) with MAML. Third, perform domain adaptation using the initial frame. We follow this procedure to build two trackers, named Retina-MAML and FCOS-MAML, based on two modern detectors RetinaNet and FCOS. Evaluations on four benchmarks show that both trackers are competitive against state-of-the-art trackers. On OTB-100, Retina-MAML achieves the highest ever AUC of 0.712. On TrackingNet, FCOS-MAML ranks the first on the leader board with an AUC of 0.757 and the normalized precision of 0.822. Both trackers run in real-time at 40 FPS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Object Tracking | TrackingNet (test) | Normalized Precision (Pnorm)82.2 | 460 | |
| Visual Object Tracking | LaSOT (test) | AUC52.3 | 444 | |
| Object Tracking | LaSoT | AUC52.3 | 333 | |
| Object Tracking | TrackingNet | Precision (P)72.5 | 225 | |
| Visual Object Tracking | OTB-100 | AUC71.2 | 136 | |
| Visual Object Tracking | VOT 2018 (test) | EAO0.452 | 54 | |
| Visual Object Tracking | LaSOT 1.0 (test) | AUC52.3 | 42 | |
| Single Object Tracking | TrackingNet 57 (test) | AUC75.7 | 20 | |
| Single Object Tracking | LaSOT 23 (test) | AUC52.3 | 20 |