Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CoTracker: It is Better to Track Together

About

We introduce CoTracker, a transformer-based model that tracks a large number of 2D points in long video sequences. Differently from most existing approaches that track points independently, CoTracker tracks them jointly, accounting for their dependencies. We show that joint tracking significantly improves tracking accuracy and robustness, and allows CoTracker to track occluded points and points outside of the camera view. We also introduce several innovations for this class of trackers, including using token proxies that significantly improve memory efficiency and allow CoTracker to track 70k points jointly and simultaneously at inference on a single GPU. CoTracker is an online algorithm that operates causally on short windows. However, it is trained utilizing unrolled windows as a recurrent network, maintaining tracks for long periods of time even when points are occluded or leave the field of view. Quantitatively, CoTracker substantially outperforms prior trackers on standard point-tracking benchmarks.

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht• 2023

Related benchmarks

TaskDatasetResultRank
Point TrackingTAP-Vid DAVIS (First)
Delta Avg (<c)68.6
76
Point TrackingTAP-Vid Kinetics (First)
Avg Displacement Error (delta_avg)64.5
53
Point TrackingDAVIS TAP-Vid
Average Jaccard (AJ)65.9
52
Point TrackingTAP-Vid Kinetics
Overall Accuracy86.5
48
Point TrackingDAVIS
AJ61.8
38
Point TrackingTAP-Vid DAVIS (Strided)
Avg Delta Error79.1
33
Point TrackingTAP-Vid RGB-Stacking (test)
AJ63.1
32
Point TrackingTAP-Vid DAVIS (test)
AJ60.6
31
Point TrackingTAP-Vid Kinetics (test)
Average Jitter (AJ)48.7
30
Point TrackingTAP-Vid-Kinetics (val)
Average Displacement Error64.3
25
Showing 10 of 50 rows

Other info

Code

Follow for update