PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking
About
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks. Our data and code are publicly available at: https://pointodyssey.com
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Point Tracking | DAVIS | -- | 38 | |
| Point Tracking | TAP-Vid-Kinetics (val) | Average Displacement Error63.5 | 25 | |
| Point Tracking | TAP-Vid DAVIS (First) | Delta Avg (<c)69.1 | 19 | |
| Point Tracking | TAP-Vid DAVIS (Strided) | Avg Delta Error73.7 | 17 | |
| Point Tracking | RGB-Stacking | Average Delta58.5 | 13 | |
| Point Tracking | AllTracker benchmark suite | Dav. Average Error62.5 | 13 | |
| Point Tracking | PointOdyssey (test) | Delta (δ)29 | 13 | |
| Point Tracking | RoboTAP | delta_avg63.5 | 12 | |
| Long-term Point Tracking | TAP-Vid DAVIS 480p (test) | Avg Temporal Error73.6 | 12 | |
| Point Tracking | TAP-Vid DAVIS 15 | MTE4.6 | 9 |