Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Seurat: From Moving Points to Depth

About

Accurate depth estimation from monocular videos remains challenging due to ambiguities inherent in single-view geometry, as crucial depth cues like stereopsis are absent. However, humans often perceive relative depth intuitively by observing variations in the size and spacing of objects as they move. Inspired by this, we propose a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories. Specifically, we use off-the-shelf point tracking models to capture 2D trajectories. Then, our approach employs spatial and temporal transformers to process these trajectories and directly infer depth changes over time. Evaluated on the TAPVid-3D benchmark, our method demonstrates robust zero-shot performance, generalizing effectively from synthetic to real-world datasets. Results indicate that our approach achieves temporally smooth, high-accuracy depth predictions across diverse domains.

Seokju Cho, Jiahui Huang, Seungryong Kim, Joon-Young Lee• 2025

Related benchmarks

TaskDatasetResultRank
3D Point TrackingTAPVid-3D PStudio (minival)
3D-AJ12.5
19
3D Point TrackingTAPVid-3D Average (minival)
3D AJ0.12
19
3D Point TrackingTAPVid-3D DriveTrack (minival)
3D AJ Score8.7
19
3D Point TrackingTAPVid-3D Aria (minival)
3D-AJ15.1
19
3D Point TrackingTAPVid-3D (minival)
Aria 3D-AJ25.1
16
Depth EstimationTAPVid-3D
Aria AbsRel0.179
10
Depth EstimationTAPVid-3D Aria (minival)
3D AJ14.6
10
Depth EstimationTAPVid-3D PStudio (minival)
3D-AJ12.7
10
Depth EstimationTAPVid-3D Average (minival)
3D AJ Score11.4
10
Depth EstimationTAPVid-3D DriveTrack (minival)
3D AJ Score6.9
10
Showing 10 of 11 rows

Other info

Code

Follow for update