Real-World Point Tracking with Verifier-Guided Pseudo-Labeling
About
Models for long-term point tracking are typically trained on large synthetic datasets. The performance of these models degrades in real-world videos due to different characteristics and the absence of dense ground-truth annotations. Self-training on unlabeled videos has been explored as a practical solution, but the quality of pseudo-labels strongly depends on the reliability of teacher models, which vary across frames and scenes. In this paper, we address the problem of real-world fine-tuning and introduce verifier, a meta-model that learns to assess the reliability of tracker predictions and guide pseudo-label generation. Given candidate trajectories from multiple pretrained trackers, the verifier evaluates them per frame and selects the most trustworthy predictions, resulting in high-quality pseudo-label trajectories. When applied for fine-tuning, verifier-guided pseudo-labeling substantially improves the quality of supervision and enables data-efficient adaptation to unlabeled videos. Extensive experiments on four real-world benchmarks demonstrate that our approach achieves state-of-the-art results while requiring less data than prior self-training methods. Project page: https://kuis-ai.github.io/track_on_r
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Point Tracking | DAVIS TAP-Vid | Average Jaccard (AJ)68.1 | 52 | |
| Point Tracking | TAP-Vid Kinetics | Overall Accuracy90.5 | 48 | |
| Point Tracking | RoboTAP | AJ70.9 | 22 | |
| Point Tracking | EgoPoints | Average Displacement X67.3 | 10 | |
| Point Tracking | Dynamic Replica | Average Displacement Error75.1 | 9 | |
| Point Tracking | PointOdyssey | Average Displacement Error (ADE)53.4 | 4 |