Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Local All-Pair Correspondence for Point Tracking

About

We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences. Previous approaches in this task often rely on local 2D correlation maps to establish correspondences from a point in the query image to a local region in the target image, which often struggle with homogeneous regions or repetitive features, leading to matching ambiguities. LocoTrack overcomes this challenge with a novel approach that utilizes all-pair correspondences across regions, i.e., local 4D correlation, to establish precise correspondences, with bidirectional correspondence and matching smoothness significantly enhancing robustness against ambiguities. We also incorporate a lightweight correlation encoder to enhance computational efficiency, and a compact Transformer architecture to integrate long-term temporal information. LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.

Seokju Cho, Jiahui Huang, Jisu Nam, Honggyu An, Seungryong Kim, Joon-Young Lee• 2024

Related benchmarks

TaskDatasetResultRank
Point TrackingDAVIS TAP-Vid
Average Jaccard (AJ)69.4
41
Point TrackingDAVIS
AJ62.9
38
Point TrackingTAP-Vid Kinetics
Overall Accuracy82.1
37
Point TrackingTAP-Vid DAVIS (test)
AJ62
31
Point TrackingTAP-Vid-Kinetics (val)
Average Displacement Error66.8
25
Point TrackingKinetics
delta_avg66.8
24
Point TrackingAllTracker benchmark suite
Dav. Average Error68
13
Point TrackingRGB-Stacking
Average Delta83.2
13
Point TrackingRoboTAP
delta_avg66.8
12
Point TrackingRGB-Stacking TAP-Vid
Average Jaccard (AJ)77.4
11
Showing 10 of 16 rows

Other info

Follow for update