Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TAP-Vid: A Benchmark for Tracking Any Point in a Video

About

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.

Carl Doersch, Ankush Gupta, Larisa Markeeva, Adri\`a Recasens, Lucas Smaira, Yusuf Aytar, Jo\~ao Carreira, Andrew Zisserman, Yi Yang• 2022

Related benchmarks

TaskDatasetResultRank
Point TrackingDAVIS TAP-Vid
Average Jaccard (AJ)38.4
41
Point TrackingTAP-Vid Kinetics
Overall Accuracy85
37
Point TrackingTAP-Vid RGB-Stacking (test)
AJ50.1
32
Point TrackingTAP-Vid DAVIS (test)
AJ33
31
Point TrackingTAP-Vid Kinetics (test)
Average Jitter (AJ)38.5
30
Point TrackingTAP-Vid DAVIS (First)
Delta Avg (<c)48.6
19
Point TrackingDAVIS TAP-Vid (val)
AJ38.4
19
Point TrackingTAP-Vid DAVIS (Strided)
Avg Delta Error53.1
17
Point TrackingPointOdyssey (test)
Delta (δ)23.75
13
Long-term Point TrackingTAP-Vid DAVIS 480p (test)
Avg Temporal Error66.4
12
Showing 10 of 29 rows

Other info

Code

Follow for update