OVTrack: Open-Vocabulary Multiple Object Tracking
About
The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited to a small set of pre-defined object categories. In this paper, we address this limitation by tackling a novel task, open-vocabulary MOT, that aims to evaluate tracking beyond pre-defined training categories. We further develop OVTrack, an open-vocabulary tracker that is capable of tracking arbitrary object classes. Its design is based on two key ingredients: First, leveraging vision-language models for both classification and association via knowledge distillation; second, a data hallucination strategy for robust appearance feature learning from denoising diffusion probabilistic models. The result is an extremely data-efficient open-vocabulary tracker that sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark, while being trained solely on static images. Project page: https://www.vis.xyz/pub/ovtrack/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Object Tracking | BDD100K (val) | -- | 70 | |
| Multi-Object Tracking | TAO (val) | AssocA36.7 | 40 | |
| Generic Multiple Object Tracking | Refer-GMOT40 | MOTA27.78 | 26 | |
| Object Tracking | TAO | TETA34.7 | 22 | |
| Multi-Object Tracking | TAO 1.0 (val) | Base TETA36.3 | 14 | |
| Multi-Object Tracking | TAO (test) | -- | 13 | |
| Multi-Object Tracking | TAO 1.0 (test) | Base TETA34.8 | 8 | |
| Closed-set MOT Track mAP comparison | TAO 1.0 (val) | Track mAP500.212 | 8 | |
| Multi-Object Tracking | TAO Base classes | TETA35.5 | 6 | |
| Multi-Object Tracking | TAO (Novel classes) | TETA27.8 | 6 |