SpatialTrackerV2: 3D Point Tracking Made Easy
About
We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker. It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%, and matches the accuracy of leading dynamic 3D reconstruction approaches while running 50$\times$ faster.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Point Tracking | TAPVid-3D Aria (minival) | 3D-AJ18.6 | 19 | |
| 3D Point Tracking | TAPVid-3D DriveTrack (minival) | 3D AJ Score16.4 | 19 | |
| 3D Point Tracking | TAPVid-3D PStudio (minival) | 3D-AJ18.1 | 19 | |
| 3D Point Tracking | TAPVid-3D Average (minival) | 3D AJ0.177 | 19 | |
| 3D Point Tracking | TAPVid-3D PStudio 1.0 (test) | APD3D28.6 | 15 | |
| 3D Point Tracking | TAPVid-3D ADT 1.0 (test) | APD3D26.3 | 15 | |
| 3D Point Tracking | TAPVid-3D DriveTrack 1.0 (test) | APD3D23 | 15 | |
| Sparse Point Tracking | Panoptic Studio (PStudio) TAPVid-3D | APD85.63 | 14 | |
| Sparse Point Tracking | Dynamic Replica (DR) (test) | APD80.87 | 11 | |
| Sparse Point Tracking | PointOdyssey (PO) (test) | APD73.66 | 11 |