DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
About
In Multiple Object Tracking, objects often exhibit non-linear motion of acceleration and deceleration, with irregular direction changes. Tacking-by-detection (TBD) trackers with Kalman Filter motion prediction work well in pedestrian-dominant scenarios but fall short in complex situations when multiple objects perform non-linear and diverse motion simultaneously. To tackle the complex non-linear motion, we propose a real-time diffusion-based MOT approach named DiffMOT. Specifically, for the motion predictor component, we propose a novel Decoupled Diffusion-based Motion Predictor (D$^2$MP). It models the entire distribution of various motion presented by the data as a whole. It also predicts an individual object's motion conditioning on an individual's historical motion information. Furthermore, it optimizes the diffusion process with much fewer sampling steps. As a MOT tracker, the DiffMOT is real-time at 22.7FPS, and also outperforms the state-of-the-art on DanceTrack and SportsMOT datasets with $62.3\%$ and $76.2\%$ in HOTA metrics, respectively. To the best of our knowledge, DiffMOT is the first to introduce a diffusion probabilistic model into the MOT to tackle non-linear motion prediction.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multiple Object Tracking | MOT17 (test) | MOTA79.8 | 921 | |
| Multiple Object Tracking | MOT20 (test) | MOTA76.7 | 358 | |
| Multi-Object Tracking | DanceTrack (test) | HOTA0.623 | 355 | |
| Multi-Object Tracking | SportsMOT (test) | HOTA76.2 | 199 | |
| Multi-Object Tracking | SportsMOT | HOTA76.2 | 25 | |
| Multi-Object Tracking | DanceTrack 58 (test) | HOTA62.3 | 20 | |
| Multi-Object Tracking | SportsMOT 1.0 (test) | HOTA76.2 | 15 | |
| Multi-Object Tracking | SportsMOT 11 (test) | HOTA72.1 | 13 | |
| Multi-Object Tracking | QuadTrack (test) | HOTA16.4 | 11 | |
| Multi-Object Tracking | JRDB (test) | HOTA19.96 | 11 |