Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Distractor-Aware Memory for Visual Object Tracking with SAM2

About

Memory-based trackers are video object segmentation methods that form the target model by concatenating recently tracked frames into a memory buffer and localize the target by attending the current image to the buffered frames. While already achieving top performance on many benchmarks, it was the recent release of SAM2 that placed memory-based trackers into focus of the visual object tracking community. Nevertheless, modern trackers still struggle in the presence of distractors. We argue that a more sophisticated memory model is required, and propose a new distractor-aware memory model for SAM2 and an introspection-based update strategy that jointly addresses the segmentation accuracy as well as tracking robustness. The resulting tracker is denoted as SAM2.1++. We also propose a new distractor-distilled DiDi dataset to study the distractor problem better. SAM2.1++ outperforms SAM2.1 and related SAM memory extensions on seven benchmarks and sets a solid new state-of-the-art on six of them.

Jovana Videnovic, Alan Lukezic, Matej Kristan• 2024

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)91.25
502
Object TrackingLaSoT
AUC75.1
498
Visual Object TrackingLaSOT (test)
AUC75.1
470
Visual Object TrackingGOT-10k (test)
Average Overlap81.1
450
Visual Object TrackingVOT 2020 (test)
EAO0.729
147
Visual Object TrackingLaSoText
AUC60.9
140
Visual Object TrackingLaSOText (test)
AUC60.9
121
Object TrackingGOT-10k
AO81.1
87
Visual Object TrackingVOT 2022
EAO75.3
29
Interactive Visual TrackingInteractTrack (test)
Interactiveness43.19
25
Showing 10 of 25 rows

Other info

Code

Follow for update