Exploring Enhanced Contextual Information for Video-Level Object Tracking

About

Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the model's ability to capture and utilize contextual information at multiple levels through deep integration with the backbone. Experiments demonstrate that MCITrack achieves competitive performance across numerous benchmarks. For instance, it gets 76.6% AUC on LaSOT and 80.0% AO on GOT-10k, establishing a new state-of-the-art performance. Code and models are available at https://github.com/kangben258/MCITrack.

Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang• 2024

Related benchmarks

Task	Dataset	Result
Visual Object Tracking	TrackingNet (test)	Normalized Precision (Pnorm)92.1	502
Object Tracking	LaSoT	AUC75.3	498
Visual Object Tracking	LaSOT (test)	AUC76.6	470
Visual Object Tracking	GOT-10k (test)	Average Overlap80	450
Object Tracking	TrackingNet	Precision (P)86.1	327
Visual Object Tracking	GOT-10k	AO77.9	306
Visual Object Tracking	UAV123 (test)	AUC71.5	188
Visual Object Tracking	LaSoText	AUC54.6	140
Visual Object Tracking	LaSOText (test)	AUC55.7	121
Visual Object Tracking	TNL2k (test)	AUC60.3	92

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord