OVTrack: Open-Vocabulary Multiple Object Tracking

About

The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited to a small set of pre-defined object categories. In this paper, we address this limitation by tackling a novel task, open-vocabulary MOT, that aims to evaluate tracking beyond pre-defined training categories. We further develop OVTrack, an open-vocabulary tracker that is capable of tracking arbitrary object classes. Its design is based on two key ingredients: First, leveraging vision-language models for both classification and association via knowledge distillation; second, a data hallucination strategy for robust appearance feature learning from denoising diffusion probabilistic models. The result is an extremely data-efficient open-vocabulary tracker that sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark, while being trained solely on static images. Project page: https://www.vis.xyz/pub/ovtrack/

Siyuan Li, Tobias Fischer, Lei Ke, Henghui Ding, Martin Danelljan, Fisher Yu• 2023

Related benchmarks

Task	Dataset	Result
Multi-Object Tracking	BDD100K (val)	--	70
Multi-Object Tracking	TAO (val)	AssocA36.7	40
Generic Multiple Object Tracking	Refer-GMOT40	MOTA27.78	26
Multi-Object Tracking	TAO (test)	Base TETA53.6	23
Object Tracking	TAO	TETA34.7	22
Multi-Object Tracking	TAO 1.0 (val)	Base TETA36.3	14
Multi-Object Tracking	TAO 1.0 (test)	Base TETA34.8	8
Closed-set MOT Track mAP comparison	TAO 1.0 (val)	Track mAP500.212	8
Multi-Object Tracking	TAO Base classes	TETA35.5	6
Multi-Object Tracking	TAO (Novel classes)	TETA27.8	6

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord