Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

About

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy• 2026

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Relative Error (Rel)0.249
109
Video Depth EstimationBONN
Relative Error (Rel)0.048
103
Camera pose estimationScanNet
ATE RMSE (Avg.)0.032
61
Camera pose estimationTUM dynamics
RRE0.314
57
3D Scene Reconstruction7-Scenes (test)
Accuracy0.034
27
Sparse Point TrackingPanoptic Studio (PStudio) TAPVid-3D
APD87.32
14
3D ReconstructionNRGBD (test)
Acc3.6
12
Dense TrackingKubric
EPE1.525
11
Sparse Point TrackingDynamic Replica (DR) (test)
APD88.65
11
Sparse Point TrackingPointOdyssey (PO) (test)
APD85.86
11
Showing 10 of 14 rows

Other info

Follow for update