Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AnthroTAP: Learning Point Tracking with Real-World Motion

About

Point tracking models often struggle to generalize to real-world videos because large-scale training data is predominantly synthetic$\unicode{x2014}$the only source currently feasible to produce at scale. Collecting real-world annotations, however, is prohibitively expensive, as it requires tracking hundreds of points across frames. We introduce \textbf{AnthroTAP}, an automated pipeline that generates large-scale pseudo-labeled point tracking data from real human motion videos. Leveraging the structured complexity of human movement$\unicode{x2014}$non-rigid deformations, articulated motion, and frequent occlusions$\unicode{x2014}$AnthroTAP fits Skinned Multi-Person Linear (SMPL) models to detected humans, projects mesh vertices onto image planes, resolves occlusions via ray-casting, and filters unreliable tracks using optical flow consistency. A model trained on the AnthroTAP dataset achieves state-of-the-art performance on TAP-Vid, a challenging general-domain benchmark for tracking any point on diverse rigid and non-rigid objects (e.g., humans, animals, robots, and vehicles). Our approach outperforms recent self-training methods trained on vastly larger real datasets, while requiring only one day of training on 4 GPUs. AnthroTAP shows that structured human motion offers a scalable and effective source of real-world supervision for point tracking.

In\`es Hyeonsu Kim, Seokju Cho, Jahyeok Koo, Junghyun Park, Jiahui Huang, Honglak Lee, Joon-Young Lee, Seungryong Kim• 2025

Related benchmarks

TaskDatasetResultRank
Point TrackingTAP-Vid DAVIS (First)
Delta Avg (<c)77.3
76
Point TrackingTAP-Vid Kinetics (First)
Avg Displacement Error (delta_avg)68.4
53
Point TrackingDAVIS TAP-Vid
Average Jaccard (AJ)64.8
52
Point TrackingTAP-Vid Kinetics
Overall Accuracy86.4
48
Point TrackingTAP-Vid DAVIS (Strided)
Avg Delta Error81
33
Point TrackingRoboTAP
AJ64.7
22
Point TrackingEgoPoints
Average Displacement X61.1
10
Point TrackingRoboTAP First
Average Jitter (AJ)63.4
8
Showing 8 of 8 rows

Other info

Follow for update