DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass
About
Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume a temporal ordering to input frames, constraining their flexibility and applicability. Additionally, recent advances have successfully enabled efficient 3D reconstruction from large-scale, unposed image collections, underscoring opportunities for unified approaches to dynamic scene understanding. Motivated by this, we propose DePT3R, a novel framework that simultaneously performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass. This multi-task learning is achieved by extracting deep spatio-temporal features with a powerful backbone and regressing pixel-wise maps with dense prediction heads. Crucially, DePT3R operates without requiring camera poses, substantially enhancing its adaptability and efficiency-especially important in dynamic environments with rapid changes. We validate DePT3R on several challenging benchmarks involving dynamic scenes, demonstrating strong performance and significant improvements in memory efficiency over existing state-of-the-art methods. Data and codes are available via the open repository: https://github.com/StructuresComp/DePT3R
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sparse Point Tracking | Dynamic Replica (DR) (test) | APD91.12 | 11 | |
| Sparse Point Tracking | PointOdyssey (PO) (test) | APD91.33 | 11 | |
| 3D Reconstruction | PointOdyssey (test) | APD98.01 | 6 | |
| 3D Reconstruction | TUM RGB-D SLAM | APD92.22 | 6 | |
| 3D Reconstruction | Panoptic Studio and TUM RGB-D SLAM benchmark | APD92.22 | 6 | |
| Point Tracking | Panoptic Studio and TUM RGB-D SLAM benchmark | APD89.36 | 4 | |
| World Coordinate 3D Point Tracking | Panoptic Studio | APD89.36 | 4 |