Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels

About

Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos. Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking. In this paper, we propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system. Built on the global 3D scene representation encoded by a VGGT-style ViT, Track4World applies a novel 3D correlation scheme to simultaneously estimate the pixel-wise 2D and 3D dense flow between arbitrary frame pairs. The estimated scene flow, along with the reconstructed 3D geometry, enables subsequent efficient 3D tracking of every pixel of this video. Extensive experiments on multiple benchmarks demonstrate that our approach consistently outperforms existing methods in 2D/3D flow estimation and 3D tracking, highlighting its robustness and scalability for real-world 4D reconstruction tasks.

Jiahao Lu, Jiayi Xu, Wenbo Hu, Ruijie Zhu, Chengfeng Zhao, Sai-Kit Yeung, Ying Shan, Yuan Liu• 2026

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationSintel
Abs Rel0.1261
91
3D TrackingPointOdyssey 102 (test)
APD53.97
28
3D TrackingADT 61 (test)
APD0.6501
28
Camera pose estimationSintel 14-sequence
ATE0.119
24
Camera pose estimationBonn 60 (test)
ATE0.009
9
Point Map EstimationGMU Kitchen
Abs Rel4.31
7
Point Map EstimationKubric-3D (val)
Abs Rel0.0191
7
Point Map EstimationKITTI
Abs Rel0.0268
7
Point Map EstimationAverage
Absolute Relative Error (Abs Rel)5.52
7
Scene Flow EstimationKubric-3D short (val)
Absolute Relative Error (Abs Rel)0.0344
7
Showing 10 of 24 rows

Other info

GitHub

Follow for update