Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DELTA: Dense Efficient Long-range 3D Tracking for any video

About

Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. We introduce DELTA, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Unlike existing methods, which are limited by computational inefficiency or sparse tracking, DELTA delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy. Furthermore, we explore the impact of depth representation on tracking performance and identify log-depth as the optimal choice. Extensive experiments demonstrate the superiority of DELTA on multiple benchmarks, achieving new state-of-the-art results in both 2D and 3D dense tracking tasks. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.

Tuan Duc Ngo, Peiye Zhuang, Chuang Gan, Evangelos Kalogerakis, Sergey Tulyakov, Hsin-Ying Lee, Chaoyang Wang• 2024

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Relative Error (Rel)0.636
109
Point TrackingDAVIS
AJ60.8
38
Video Depth EstimationTUM dynamics
Abs Rel0.178
27
Point TrackingTAP-Vid-Kinetics (val)
Average Displacement Error66.6
25
2D Long-range optical flowCVO Clean 7 frames
EPE (all)0.89
16
2D Long-range optical flowCVO 7 frames (Final)
EPE (all)0.97
16
Human-centric depth estimationBONN
Abs Rel0.153
16
3D Point TrackingTAPVid-3D PStudio 1.0 (test)
APD3D27.3
15
3D Point TrackingTAPVid-3D ADT 1.0 (test)
APD3D24.4
15
3D Point TrackingTAPVid-3D DriveTrack 1.0 (test)
APD3D22.2
15
Showing 10 of 33 rows

Other info

Follow for update