Tracking by Detection and Query: An Efficient End-to-End Framework for Multi-Object Tracking
About
Multi-object tracking (MOT) is primarily dominated by two paradigms: tracking-by-detection (TBD) and tracking-by-query (TBQ). While TBD offers modular efficiency, its fragmented association pipeline often limits robustness in complex scenarios. Conversely, TBQ enhances semantic modeling end-to-end but suffers from high training costs and slow inference due to the tight coupling of detection and association. In this work, we propose the tracking-by-detection-and-query framework, TBDQ-Net, to advance the synergy between TBD and TBQ paradigms. By integrating a frozen detector with a lightweight associator, this architecture ensures intrinsic efficiency. Within this streamlined framework, we introduce tailored designs to address MOT-specific challenges. Concretely, we alleviate task conflicts and occlusions through the dual-stream update of the Basic Information Interaction (BII) module. The Content-Position Alignment (CPA) module further refines both content and positional components, providing well-aligned representations for association decoding. Extensive evaluations on DanceTrack, SportsMOT, and MOT20 benchmarks demonstrate that TBDQ-Net achieves a favorable efficiency-accuracy trade-off in challenging scenarios. Specifically, TBDQ-Net outperforms leading TBD methods by 6.0 IDF1 points on DanceTrack and achieves the best performance among TBQ methods in the crowded MOT20 benchmark. Relative to MOTRv2, TBDQ-Net reduces trainable parameters by approximately 80% while accelerating practical inference by 37.5%. These results highlight TBDQ-Net as an efficient alternative to heavy architectures, showcasing the efficacy of lightweight design. Source code is publicly available at https://github.com/FaithFlow/TBDQ-Net.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Object Tracking | DanceTrack (test) | HOTA0.694 | 471 | |
| Multi-Object Tracking | SportsMOT 1.0 (test) | HOTA75.5 | 28 | |
| Multi-Object Tracking | MOTChallenge 20 (test) | MOTA72.2 | 10 | |
| Multi-Object Tracking | DanceTrack | FPS22 | 6 |