Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SMTrack: State-Aware Mamba for Efficient Temporal Modeling in Visual Tracking

About

Visual tracking aims to automatically estimate the state of a target object in a video sequence, which is challenging especially in dynamic scenarios. Thus, numerous methods are proposed to introduce temporal cues to enhance tracking robustness. However, conventional CNN and Transformer architectures exhibit inherent limitations in modeling long-range temporal dependencies in visual tracking, often necessitating either complex customized modules or substantial computational costs to integrate temporal cues. Inspired by the success of the state space model, we propose a novel temporal modeling paradigm for visual tracking, termed State-aware Mamba Tracker (SMTrack), providing a neat pipeline for training and tracking without needing customized modules or substantial computational costs to build long-range temporal dependencies. It enjoys several merits. First, we propose a novel selective state-aware space model with state-wise parameters to capture more diverse temporal cues for robust tracking. Second, SMTrack facilitates long-range temporal interactions with linear computational complexity during training. Third, SMTrack enables each frame to interact with previously tracked frames via hidden state propagation and updating, which releases computational costs of handling temporal cues during tracking. Extensive experimental results demonstrate that SMTrack achieves promising performance with low computational costs.

Yinchao Ma, Dengqing Yang, Zhangyu He, Wenfei Yang, Tianzhu Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)--
460
Object TrackingLaSoT
AUC71.9
333
Object TrackingTrackingNet
Precision (P)85
225
Visual Object TrackingGOT-10k
AO74.7
223
Visual Object TrackingLaSoText
Precision59
88
Visual TrackingUAV
AUC70.8
22
Visual TrackingNfS
AUC69.3
19
Showing 7 of 7 rows

Other info

Follow for update