MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

About

As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term temporal information. In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging long-term memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that MeMOTR impressively surpasses the state-of-the-art method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association performance on MOT17 and generalizes well on BDD100K. Code is available at https://github.com/MCG-NJU/MeMOTR.

Ruopeng Gao, Limin Wang• 2023

Related benchmarks

Task	Dataset	Result
Multiple Object Tracking	MOT17 (test)	MOTA72.8	1038
Multi-Object Tracking	DanceTrack (test)	HOTA0.685	535
Multi-Object Tracking	SportsMOT (test)	HOTA70	319
Multi-Object Tracking	MOT17	IDF171.5	104
Multi-Object Tracking	BDD100K (val)	--	70
Multi-Object Tracking	SportsMOT 1.0 (test)	HOTA70	28
Multi-Object Tracking	MOT17 (val)	IDF171.5	24
Multi-Object Tracking	DanceTrack 58 (test)	HOTA63.4	20
Multi-Object Tracking	SpaceAnimal-MOT Zebrafish	HOTA42.5	16
Multi-Object Tracking	SpaceAnimal-MOT Fruitfly	HOTA56.8	16

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord