Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Simple Cues Lead to a Strong Multi-Object Tracker

About

For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. To this end, we propose two key ingredients that allow a standard re-identification network to excel at appearance-based tracking. We extensively analyse its failure cases, and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance. https://github.com/dvl-tum/GHOST.

Jenny Seidenschwarz, Guillem Bras\'o, Victor Castro Serrano, Ismail Elezi, Laura Leal-Taix\'e• 2022

Related benchmarks

TaskDatasetResultRank
Multiple Object TrackingMOT17 (test)
MOTA78.9
921
Multiple Object TrackingMOT20 (test)
MOTA73.7
358
Multi-Object TrackingDanceTrack (test)
HOTA0.567
355
Multi-Object TrackingBDD100K (val)
mIDF155.6
70
Multi-Object TrackingMOT17
MOTA78.7
55
Multi-Object TrackingMOT 2020 (test)
MOTA73.7
44
Multi-Object TrackingBDD100K (test)
Mean IDF157
36
Multi-Object TrackingMOT 2017 (test)
MOTA78.7
34
Multiple Object TrackingMOT20
MOTA73.7
21
Showing 9 of 9 rows

Other info

Code

Follow for update