Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

About

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy• 2026

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)67
193
Camera pose estimationScanNet
RPE (t)0.012
119
Video Depth EstimationBONN
Relative Error (Rel)0.048
103
Camera pose estimationTUM dynamics
ATE0.01
81
3D Scene Reconstruction7-Scenes (test)
Accuracy0.034
27
Sparse Point TrackingPanoptic Studio (PStudio) TAPVid-3D
APD87.32
14
3D ReconstructionNRGBD (test)
Acc3.6
12
Dense TrackingKubric
EPE1.525
11
Sparse Point TrackingDynamic Replica (DR) (test)
APD88.65
11
Sparse Point TrackingPointOdyssey (PO) (test)
APD85.86
11
Showing 10 of 14 rows

Other info

Follow for update